Subscribe to the PwC Newsletter

Join the community, edit social preview.

mapping to declarative knowledge for word problem solving

Add a new code entry for this paper

Remove a code repository from this paper, mark the official implementation from paper authors, add a new evaluation result row.

  • TRANSLATION

Remove a task

Add a method, remove a method, edit datasets, mapping to declarative knowledge for word problem solving.

TACL 2018  ·  Subhro Roy , Dan Roth · Edit social preview

Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding of several mathematical concepts such as dimensional analysis, subset relationships, etc. In this paper, we develop declarative rules which govern the translation of natural language description of these concepts to math expressions. We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, support interpretability of the answer expression. Our method models the mapping to declarative knowledge as a latent variable, thus removing the need for expensive annotations. Experimental evaluation suggests that our domain knowledge based solver outperforms all other systems, and that it generalizes better in the realistic case where the training data it is exposed to is biased in a different way than the test data.

Code Edit Add Remove Mark official

Tasks edit add remove, datasets edit, results from the paper edit add remove, methods edit add remove.

← Return to Article Details Download Download PDF

Mapping to Declarative Knowledge for Word Problem Solving

Solving Math Word Problem with External Knowledge and Entailment Loss

  • Conference paper
  • First Online: 23 September 2023
  • Cite this conference paper

mapping to declarative knowledge for word problem solving

  • Rizhongtian Lu 11 ,
  • Yongmei Tan 11 ,
  • Shaozhang Niu 11 &
  • Yunze Lin 11  

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14262))

Included in the following conference series:

  • International Conference on Artificial Neural Networks

652 Accesses

Automatic math word problem(MWP) solving is an interesting task for NLP researchers in recent years. Over the last few years, a growing number of effective sequence-to-sequence deep learning-based model are proposed. However, these models do not efficiently consider factual errors as the sequence-to-sequence model can produce expressions that do not appear in the question. Additionally, these models neglect external knowledge information during the math word problem-solving process. To address these problems, we propose a model that can automatically solve math word problems with External Knowledge and Entailment Loss (MathEE). MathEE uses a Textual-Entailment auxiliary task to identify factual errors and introduces an entity graph based on external knowledge to model the highly relevant entity words in the question. Our experimental results on publicly available Chinese datasets Ape210K and Math23K show that MathEE achieves an accuracy rate of 74.43% and 78.7%, which is 2.08% and 1.6% higher than strong baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

https://pytorch.org/ .

Bakman, Y.: Robust understanding of word problems with extraneous information (2007)

Google Scholar  

Feigenbaum, E.A., et al.: Computers and Thought. McGraw-Hill, New York (1963)

MATH   Google Scholar  

Fletcher, C.R.: Understanding and solving arithmetic word problems: a computer simulation. Behav. Res. Methods Instrum. Comput. 17 (5), 565–571 (1985)

Article   Google Scholar  

Wang, L., et al.: MathDQN: solving arithmetic word problems via deep reinforcement learning, vol. 32, no. 1 (2018)

Wang, Y., Liu, X., Shi, S.: Deep neural solver for math word problems, pp. 845–854 (2017)

Xie, Z., Sun, S.: A goal-driven tree-structured neural model for math word problems, pp. 5299–5305 (2019)

Wang, L., et al.: Translating a math word problem to a expression tree. In: EMNLP, pp. 1064–1069 (2018)

Chiang, T.R., Chen, Y N.: Semantically-aligned equation generation for solving and reasoning math word problems. In: NAACL-HLT, pp. 2656–2668 (2019)

Wang, L., et al.: Template-based math word problem solvers with recursive neural networks. In: AAAI (2019)

Liu, Q., et al.: Tree-structured decoding for solving math word problems. In: EMNLP-IJCNLP, pp. 2370–2379 (2019)

Zhao, W., et al.: Ape210k: a large-scale and template-rich dataset of math word problems. arXiv preprint: arXiv:2009.11506 (2020)

Huang, D., et al.: Learning fine-grained expressions to solve math word problems, pp. 805–814 (2017)

Roy, S., Roth, D.: Mapping to declarative knowledge for word problem solving. Trans. Assoc. Comput. Linguist. 6 , 159–172 (2018)

Lewis, A.B.: Training students to represent arithmetic word problems. 81 (4), 521 (1989). American Psychological Association

Huang, D., et al.: How well do computers solve math word problems? Large-scale dataset construction and evaluation, pp. 887–896 (2016)

Dong, L., et al.: Unified language model pre-training for natural language understanding and generation (2019)

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)

Velickovic, P., et al.: Graph attention networks (2017)

Xu, B., Wang, N., Chen, T.: Empirical evaluation of rectified activations in convolutional network (2015)

Mei, J.: Tongyi ci cilin. Shanghai Cishu Chubanshe (1985)

Chiang, T.-R., Chen, Y.-N.: Semantically-aligned equation generation for solving and reasoning math word problems (2018)

Li, S., et al.: Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem. arXiv preprint: arXiv:2004.13781 (2020)

Qin, J., et al. Neural-symbolic solver for math word problems with auxiliary tasks. arXiv preprint: arXiv:2107.01431 (2021)

Liang, Z., Zhang, J., Shao, J., et al.: MWP-BERT: a strong baseline for math word problems (2021)

Lan, Y., et al.: MWPToolkit: an open-source framework for deep learning-based math word problem solvers. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, pp. 13188–13190 (2022)

Liu, Y., Ott, M., Goyal, N., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint: arXiv:1907.11692 (2019)

Zou, Y., Lu, W.: Text2Math: end-to-end parsing text into math expressions. arXiv preprint: arXiv:1910.06571 (2019)

Xie, Z., Sun, S.: A goal-driven tree-structured neural model for math word problems. In: IJCAI, pp. 5299–5305 (2019)

Download references

Author information

Authors and affiliations.

Beijing University of Posts and Telecommunications, Beijing, China

Rizhongtian Lu, Yongmei Tan, Shaozhang Niu & Yunze Lin

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yongmei Tan .

Editor information

Editors and affiliations.

Democritus University of Thrace, Xanthi, Greece

Lazaros Iliadis

Antonios Papaleonidas

Lancaster University, Lancaster, UK

Plamen Angelov

Teesside University, Middlesbrough, UK

Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Lu, R., Tan, Y., Niu, S., Lin, Y. (2023). Solving Math Word Problem with External Knowledge and Entailment Loss. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14262. Springer, Cham. https://doi.org/10.1007/978-3-031-44201-8_27

Download citation

DOI : https://doi.org/10.1007/978-3-031-44201-8_27

Published : 23 September 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-44200-1

Online ISBN : 978-3-031-44201-8

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Mapping to Declarative Knowledge for Word Problem Solving

Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding of several mathematical concepts such as dimensional analysis, subset relationships, etc. In this paper, we develop declarative rules which govern the translation of natural language description of these concepts to math expressions. We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, supporting interpretability of the answer expression. Our method models the mapping to declarative knowledge as a latent variable, thus removing the need for expensive annotations. Experimental evaluation suggests that our domain knowledge based solver outperforms all other systems, and that it generalizes better in the realistic case where the training data it is exposed to is biased in a different way than the test data.

  • Related Documents

A Study on Variables in Causal Relations with Mathematical Word Problem Solving Ability of Students with Math Difficulties:Application of a Path Analysis

Effects of video self-modeling using a smart devices on the math word problem solving for student with asd, difficulties encountered in mathematical word problem solving of the grade six learners, transfer effects of learning through two varied worked examples on word-problem solving, improving language comprehension to enhance word-problem solving, the use of a bar model drawing to teach word problem solving to students with mathematics difficulties, a comparison of updating processes in children good or poor in arithmetic word problem-solving, configuration tree solver: a technology for automated design and configuration.

Abstract Configuration is a process of generating a definitive description of a product or an order that satisfies a set of specified requirements and known constraints. Knowledge-based technology is an enabling factor in automation of configuration tasks found in the business operation. In this paper, we describe a configuration technique that is well suited for configuring “decomposable” artifacts with reasonably well defined structure and constraints. This technique may be classified as a member of a general class of decompositional approaches to configuration. The domain knowledge is structured as a general model of the artifact, an and-or hierarchy of the artifact’s elements, features, and characteristics. The model includes constraints and local specialists which are attached to the elements of the and-or-tree. Given the specific configuration requirements, the problem solving engine searches for a solution, a subtree, that satisfies the requirements and the applicable constraints. We describe an application of this approach that performs configuration and design of an automotive component.

Intensive Word Problem Solving for Students With Learning Disabilities in Mathematics

State exams frequently use word problems to measure mathematics performance making difficulties with word problem solving a barrier for many students with learning disabilities (LD) in mathematics. Based on meta-analytic data from students with LD, five empirically validated word-problem strategies are presented with components of model-based problem solving (MBPS) highlighted.

Effects of the Schema-Based Problem-Solving Strategy Instruction with the C-S-A Sequence on the Ratio and Rate(Percentage) Mathematical Word-Problem Solving Achievement of Students with Mathematical Learning Difficulties

Export citation format, share document.

Mapping to Declarative Knowledge for Word Problem Solving

Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding of several mathematical concepts such as dimensional analysis, subset relationships, etc. In this paper, we develop declarative rules which govern the translation of natural language description of these concepts to math expressions. We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, support interpretability of the answer expression. Our method models the mapping to declarative knowledge as a latent variable, thus removing the need for expensive annotations. Experimental evaluation suggests that our domain knowledge based solver outperforms all other systems, and that it generalizes better in the realistic case where the training data it is exposed to is biased in a different way than the test data.

1 Introduction

Many natural language understanding situations require reasoning with respect to numbers or quantities – understanding financial news, sports results, or the number of casualties in a bombing. Math word problems form a natural abstraction to a lot of these quantitative reasoning problems. Consequently, there has been a growing interest in developing automated methods to solve math word problems [ Kushman et al., 2014 , Hosseini et al., 2014 , Roy and Roth, 2015 ] .

Understanding and solving math word problems involves interpreting natural language description of mathematical concepts, as well as understanding their interaction with the physical world. Consider the elementary school level arithmetic word problem shown in Fig 1 . To solve the problem, one needs to understand that “apple pies” and “pecan pies” are kinds of “pies”, and hence, the number of apple pies and pecan pies needs to be summed up to get the total number of pies. Similarly, detecting that “5” represents “the number of pies per row” and applying dimensional analysis or unit compatibility knowledge, helps us infer that the total number of pies needs to be divided by 5 5 5 to get the answer. Besides part-whole relationship and dimensional analysis, there are several other concepts that are needed to support reasoning in math word problems. Some of these involve understanding comparisons, transactions, and the application of math or physics formulas. Most of this knowledge can be encoded as declarative rules, as illustrated in this paper.

This paper introduces a framework for incorporating this “declarative knowledge” into word problem solving. We focus on arithmetic word problems, whose solution can be obtained by combining the numbers in the problem with basic operations (addition, subtraction, multiplication or division). For combining a pair of numbers or math sub-expressions, our method first predicts the math concept that is needed for it (e.g., subset relationship, dimensional analysis, etc.), and then predicts a declarative rule under that concept to infer the mathematical operation. We model the selection of declarative rules as a latent variable, which removes the need for expensive annotations for the intermediate steps.

The proposed approach has some clear advantages compared to existing work on word problem solving. First, it provides interpretability of the solution, without expensive annotations. Our method selects a declarative knowledge based inference rule for each operation needed in the solution. These rules provide an explanation for the operations performed. In particular, it learns to select relevant rules without explicit annotations for them. Second, each individual operation in the solution expression can be generated independently by a separate mathematical concept. This allows our method to handle multiple concepts in the same problem.

We show that existing datasets of arithmetic word problems suffer from significant vocabulary biases and, consequently, existing solvers do not do well on conceptually similar problems that are not biased in the same way. Our method, on the other hand, learns the right abstractions even in the presence of biases in the data. We also introduce a novel approach to gather word problems without these biases, creating a new dataset of 1492 1492 1492 problems.

The next section discusses related work. We next introduce the mathematical concepts required for arithmetic word problems, as well as the declarative rules for each concept. Section 4 describes our model – how we predict answers using declarative knowledge – and provides the details of our training paradigm. Finally, we provide experimental evaluation of our proposed method in Section 6 , and then conclude with a discussion of future work.

2 Related Work

Our work is primarily related to three major strands of research - automatic word problem solving, semantic parsing, as well as approaches incorporating background knowledge in learning.

2.1 Automatic Word Problem Solving

There has been a growing interest in automatically solving math word problems, with various systems focusing on particular types of problems. These can be broadly categorized into two types: arithmetic and algebra.

Arithmetic Word Problems Arithmetic problems involve combining numbers with basic operations (addition, subtraction, multiplication and division), and are generally directed towards elementary school students. ? ), ? ) as well as this work focus on this class of word problems. The works of ? ) and ? ) focus on arithmetic problems involving only addition and subtraction. Some of these approaches also try to incorporate some form of declarative or domain knowledge. ? ) incorporates the transfer phenomenon by classifying verbs; ? ) maps problems to a set of formulas. Both require extensive annotations for intermediate steps (verb classification for ? ), alignment of numbers to formulas for ? ), etc). In contrast, our method can handle a more general class of problems, while training only requires problem-equation pairs coupled with rate component annotations. ? ) focuses only on using dimensional analysis knowledge, and handles the same class of problems as we do. In contrast, our method provides a framework for including any form of declarative knowledge, exemplified here by incorporating common concepts required for arithmetic problems.

Algebra Word Problems Algebra word problems are characterized by the use of (one or more) variables in contructing (one or more) equations. These are typically middle or high school problems. ? ) looks at single equation problems, and ? ) focuses on number word problems. ? ) introduces a template based approach to handle general algebra word problems and several works have later proposed improvements over this approach [ Zhou et al., 2015 , Upadhyay et al., 2016 , Huang et al., 2017 ] . There has also been work on generating rationale for word problem solving [ Ling et al., 2017 ] . More recently, some focus turned to pre-university exam questions [ Matsuzaki et al., 2017 , Hopkins et al., 2017 ] , which requires handling a wider range of problems and often more complex semantics.

2.2 Semantic Parsing

Our work is also related to learning semantic parsers from indirect supervision [ Clarke et al., 2010 , Liang et al., 2011 ] . The general approach here is to learn a mapping of sentences to logical forms, with the only supervision being the response of executing the logical form on a knowledge base. Similarly, we learn to select declarative rules from supervision that only includes the final operation (and not which rule generated it). However, in contrast to the semantic parsing work, in our case the selection of each declarative rule usually requires reasoning across multiple sentences. Also, we do not require an explicit grounding of words or phrases to logical variables.

2.3 Background Knowledge in Learning

Approaches to incorporate knowledge in learning started with Explanation based Learning (EBL) [ DeJong, 1993 , DeJong, 2014 ] . EBL uses domain knowledge based on observable predicates, whereas we learn to map text to predicates of our declarative knowledge. More recent approaches tried to incorporate knowledge in the form of constraints or expectations from the output [ Roth and tau Yih, 2004 , wei Chang et al., 2007 , Chang et al., 2012 , Ganchev et al., 2010 , Smith and Eisner, 2006 , Naseem et al., 2010 , Bisk and Hockenmaier, 2012 , Gimpel and Bansal, 2014 ] .

Finally, we note that there has been some work in the context of Question Answering on perturbing questions or answers as a way to test or assure the robustness of the approach or lack of  [ Khashabi et al., 2016 , Jia and Liang, 2017 ] . We make used of similar ideas in order to generate an unbiased test set for Math word problems (Sec.  6 ).

3 Knowledge Representation

We introduce here our representation of domain knowledge. We organize the knowledge hierarchically in two levels – concepts and declarative rules. A math concept is a phenomenon which needs to be understood to apply reasoning over quantities. Examples of concepts include part-whole relations, dimensional analysis, etc. Under each concept, there are a few declarative rules, which dictate which operation is needed in a particular context. An example of a declarative rule under part-whole concept can be that “if two numbers quantify “parts” of a larger quantity, the operation between them must be addition”. These rules use concept specific predicates, which we exemplify in the following subsections.

Since this work focuses on arithmetic word problems, we consider 4 4 4 math concepts which are most common in these problems, as follows:

Transfer : This involves understanding the transfer of objects from one person to another. For example, the action described by the sentence “Tim gave 5 apples to Jim”, results in Tim losing “5 apples” and Jim gaining “5 apples”.

Dimensional Analysis : This involves understanding compatibility of units or dimensions. For example, “30 pies” can be divided by “5 pies per row” to get the number of rows.

Part-Whole Relation : This includes asserting that if two numbers quantify parts of a larger quantity, they are to be added. For example, the problem in Section 1 involves understanding “pecan pies” and “apple pies” are parts of “pies”, and hence must be added.

Explicit Math : Word problems often mention explicit math relationships among quantities or entities in the problem. For example, “Jim is 5 inches taller than Tim”. This concept captures the reasoning needed for such relationships.

Each of these concepts comprises a small number of declarative rules which determine the math operations; we describe them below.

3.1 Transfer

Consider the following excerpt of a word problem exhibiting a transfer phenomenon: “Stephen owns 5 5 5 books. Daniel gave him 4 4 4 books. The goal of the declarative rules is to determine which operation is required between 5 5 5 and 4 4 4 , given that we know that a transfer is taking place. We note that a transfer usually involves two entities, which occur as subject and indirect object in a sentence. Also, the direction of transfer is determined by the verbs associated with the entities. We define a set of variables to denote these properties; we define as Subj1, Verb1, IObj1 the subject, verb and indirect object associated with the first number, and as Subj2, Verb2, IObj2 the subject, verb and indirect object related to the second number. For the above example, the assignment of the variables are shown below:

[Stephen] Subj1 [owns] Verb1 5 5 5 books. [Daniel] Subj2 [gave] Verb2 [him] IObj2 4 4 4 books.

In order to determine the direction of transfer, we require some classification of verbs. In particular, we classify each verb into one of five classes: HAVE, GET, GIVE, CONSTRUCT and DESTROY. The HAVE class consists of all verbs which signify the state of an entity, such as “have”, “own”, etc. The GET class contains verbs which indicate the gaining of things for the subject. Examples of such verbs are “acquire”, “borrow”, etc. The GIVE class contains verbs which indicate the loss of things for the subject. Verbs like “lend”, “give” belong to this class. Finally CONSTRUCT class constitutes verbs indicating construction or creation, like “build”, “fill”, etc., while DESTROY verbs indicate destruction related verbs like “destroy”, “eat”, “use”, etc. This verb classification is largely based on the work of [ Hosseini et al., 2014 ] .

Finally, the declarative rules for this concept have the following form:

[ Verb1 ∈ HAVE ] ∧ [ Verb2 ∈ GIVE ] ∧ [ Coref ⁡ ( Subj1 , IObj2 ) ] ⇒ Addition ⇒ delimited-[] Verb1 HAVE delimited-[] Verb2 GIVE delimited-[] Coref Subj1 IObj2 Addition [\text{Verb1}\in\text{HAVE}]\wedge[\text{Verb2}\in\text{GIVE}]\wedge[\operatorname{\mathrm{Coref}}(\text{Subj1},\text{IObj2})]\Rightarrow\text{Addition}

where Coref ⁡ ( A , B ) Coref 𝐴 𝐵 \operatorname{\mathrm{Coref}}(A,B) is true when A 𝐴 A and B 𝐵 B represent the same entity or are coreferent, and is false otherwise. In the examples above, Verb1 is “own” and hence [ Verb1 ∈ HAVE ] delimited-[] Verb1 HAVE [\text{Verb1}\in\text{HAVE}] is true. Verb2 is “give” and hence [ Verb2 ∈ GIVE ] delimited-[] Verb2 GIVE [\text{Verb2}\in\text{GIVE}] is true. Finally, Subj1 and IObj2 both refer to Stephen, so [ Coref ⁡ ( Subj1 , IObj2 ) ] delimited-[] Coref Subj1 IObj2 [\operatorname{\mathrm{Coref}}(\text{Subj1},\text{IObj2})] returns true. As a result, the above declarative rule dictates that addition should be performed between 5 5 5 and 4 4 4 .

3.2 Dimensional Analysis

We now look at the use of dimensional analysis knowledge in word problem solving. To use dimensional analysis, one needs to extract the units of numbers as well as the relations between the units. Consider the following excerpt of a word problem: “Stephen has 5 5 5 bags. Each bag has 4 4 4 apples. Knowing that the unit of 5 5 5 is “bag” and the effective unit of 4 4 4 is “apples per bag”, allows us to infer that the numbers can be multiplied to obtain the total number of apples.

To capture these dependencies, we first introduce a few terms. Whenever a number has a unit of the form “A per B”, we refer to “A” as the unit of the number, and refer to “B” as the rate component of the number . In our example, the unit of 4 4 4 is “apple”, and the rate component of 4 4 4 is “bag”. We define variables Unit1 and Rate1 to denote the unit and the rate component of the first number respectively. We similarly define Unit2 and Rate2. For the above example, the assignment of variables are shown below:

Stephen has 5 5 5 [bags] Unit1 . Each [bag] Rate2 has 4 4 4 [apples] Unit2 .

Finally, the declarative rule applicable for our example has the following form:

[ Coref ⁡ ( Unit1 , Rate2 ) ] ⇒ Multiplication ⇒ delimited-[] Coref Unit1 Rate2 Multiplication [\operatorname{\mathrm{Coref}}(\text{Unit1},\text{Rate2})]\Rightarrow\text{Multiplication}

We only have 3 3 3 rules for dimensional analysis. They generate multiplication or division operations.

3.3 Explicit Math

In this subsection, we want to capture the reasoning behind explicit math relationships expressed in word problems such as the one described in: “Stephen has 5 5 5 apples. Daniel has 4 4 4 more apples than Stephen” . We define by Math1 and Math2 any explicit math term associated with the first and second numbers respectively. As was the case for transfers, we also define Subj1, IObj1, Subj2, and IObj2 to denote the entities participating in the math relationship. The assignment of these variables in our example is:

[Stephen] Subj1 has 5 5 5 apples. [Daniel] Subj2 has 4 4 4 [more apples than] Math2 [Stephen] IObj2 .

We classify explicit math terms into one of three classes - ADD, SUB and MUL. ADD comprises terms for addition, like “more than”, “taller than” and “heavier than”. SUB consists of terms for subtraction like“less than”, “shorter than”, etc., and MUL contains terms indicating multiplication, like “times”, “twice” and “thrice”. Finally, the declarative rule that applies for our example is:

[ Coref ⁡ ( Subj1 , IObj2 ) ] ∧ [ Math2 ∈ ADD ] ⇒ Addition ⇒ delimited-[] Coref Subj1 IObj2 delimited-[] Math2 ADD Addition [\operatorname{\mathrm{Coref}}(\text{Subj1},\text{IObj2})]\wedge[\text{Math2}\in\text{ADD}]\Rightarrow\text{Addition}

We have only 7 7 7 rules for explicit math.

3.4 Part-Whole Relation

Understanding part-whole relationship entails understanding whether two quantities are hyponym, hypernym or siblings (that is, co-hyponym, or parts of the same quantity). For example, in the excerpt “Mrs. Hilt has 5 pecan pies and 4 apple pies” , determining that pecan pies and apple pies are parts of all pies, helps inferring that addition is needed. We have 3 simple rules which directly map from Hyponym, Hypernym or Sibling detection to the corresponding math operation. For the above example, the applicable declarative rule is:

[ Sibling ​ ( Number1 , Number2 ) ] ⇒ Addition ⇒ delimited-[] Sibling Number1 Number2 Addition [\text{Sibling}(\text{Number1},\text{Number2})]\Rightarrow\text{Addition}

The rules for part-whole concept can generate addition and subtraction operations. Table 1 gives a list of all the declarative rules. Note that all the declarative rules are designed to determine an operation between two numbers only. We introduce a strategy in Section 4 , which facilitates combining sub-expressions with these rules.

4 Mapping of Word Problems to Declarative Knowledge

Given an input arithmetic word problem x 𝑥 x , the goal is to predict the math expression y 𝑦 y , which generates the correct answer. In order to derive the expression y 𝑦 y from the word problem x 𝑥 x , we leverage math concepts and declarative rules that we introduced in Section 3 . In order to combine two numbers mentioned in x 𝑥 x , we first predict a concept k 𝑘 k , and then we choose a declarative knowledge rule r 𝑟 r from k 𝑘 k . The rule r 𝑟 r generates the math operation needed to combine the two numbers. Consider the first example in Table  2 . To combine 6 6 6 and 9 9 9 , we first decide on the transfer concept, and then choose an appropriate rule under transfer to generate the operation.

6 9 (6+9) and 3 3 3 .

The representative number for a sub-expression is chosen such that it preserves the reasoning needed for the combination of this sub-expression with other numbers. We follow a heuristic to choose a representative number from a sub-expression:

For transfers and part-whole relationship, we choose the representative number of the left subtree.

In case of rate relationship, we choose the number which does not have a rate component.

In case of explicit math, we choose the number which is not directly associated with the explicit math expression.

4.1 Scoring Answer Derivations

Given the input word problem x 𝑥 x , the solution math expression y 𝑦 y is constructed by combining numbers in x 𝑥 x with operations. We refer to the set of operations used in an expression y 𝑦 y as ⊙ ⁡ ( y ) direct-product 𝑦 \operatorname{\odot}(y) . Each operation o 𝑜 o in ⊙ ⁡ ( y ) direct-product 𝑦 \operatorname{\odot}(y) is generated by first choosing a concept k o superscript 𝑘 𝑜 k^{o} , and then selecting a declarative rule r o superscript 𝑟 𝑜 r^{o} from that concept.

In order to discriminate between multiple candidate solution expressions of a word problem x 𝑥 x , we score them using a linear model over features extracted from the derivation of the solution. Our scoring function has the following form:

where ϕ k ​ ( x , k o ) subscript italic-ϕ 𝑘 𝑥 superscript 𝑘 𝑜 \phi_{k}(x,k^{o}) and ϕ r ​ ( x , r o ) subscript italic-ϕ 𝑟 𝑥 superscript 𝑟 𝑜 \phi_{r}(x,r^{o}) are feature vectors related to concept k o superscript 𝑘 𝑜 k^{o} , and declarative rule r o superscript 𝑟 𝑜 r^{o} , respectively, and w k subscript 𝑤 𝑘 w_{k} and w r subscript 𝑤 𝑟 w_{r} are the corresponding weight vectors. The term w k ​ ϕ k ​ ( x , k o ) subscript 𝑤 𝑘 subscript italic-ϕ 𝑘 𝑥 superscript 𝑘 𝑜 w_{k}\phi_{k}(x,k^{o}) is the score for the selection of k o superscript 𝑘 𝑜 k^{o} , and the term w r ​ ϕ r ​ ( x , r o ) subscript 𝑤 𝑟 subscript italic-ϕ 𝑟 𝑥 superscript 𝑟 𝑜 w_{r}\phi_{r}(x,r^{o}) is the score for the selection of r o superscript 𝑟 𝑜 r^{o} . Finally, the total score is the sum of the scores of all concepts and rule choices, over all operations of y 𝑦 y .

4.2 Learning

We wish to estimate the parameters of the weight vectors w k subscript 𝑤 𝑘 w_{k} and w r subscript 𝑤 𝑟 w_{r} , such that our scoring function assigns a higher score to the correct math expression, and a lower score to other competing math expressions. For learning the parameters, we assume access to word problems paired with the correct math expression. We show in Section 5 that certain simple heuristics and rate component annotations can be used to create somewhat noisy annotations for the concepts needed for individual operations. Hence, we will assume for our formulation access to concept supervision as well. We thus assume access to m 𝑚 m examples of the following form: { ( x 1 , y 1 , { k o } o ∈ ⊙ ⁡ ( y 1 ) ) , ( x 2 , y 2 , { k o } o ∈ ⊙ ⁡ ( y 2 ) ) , \{(x_{1},y_{1},\{k^{o}\}_{o\in\operatorname{\odot}(y_{1})}),(x_{2},y_{2},\{k^{o}\}_{o\in\operatorname{\odot}(y_{2})}), … , ( x m , y m , { k o } o ∈ ⊙ ⁡ ( y m ) ) } \ldots,(x_{m},y_{m},\{k^{o}\}_{o\in\operatorname{\odot}(y_{m})})\} .

We do not have any supervision for declarative rule selection, which we model as a latent variable.

Two Stage Learning : A straightforward solution for our learning problem could be to jointly learn w k subscript 𝑤 𝑘 w_{k} and w r subscript 𝑤 𝑟 w_{r} using latent structured SVM. However, we found that this model does not perform well. Instead, we chose a two stage learning protocol. At the first stage, we only learn w r subscript 𝑤 𝑟 w_{r} , the weight vector for scoring the declarative rule choice. Once learned, we fix the parameters for w r subscript 𝑤 𝑟 w_{r} , and then learn the parameters for w k subscript 𝑤 𝑘 w_{k} .

In order to learn the parameters for w r subscript 𝑤 𝑟 w_{r} , we solve:

where r ^ ∈ k o ^ 𝑟 superscript 𝑘 𝑜 \hat{r}\in k^{o} implies that r ^ ^ 𝑟 \hat{r} is a declarative rule for concept k o superscript 𝑘 𝑜 k^{o} , r ^ ⇒ o ⇒ ^ 𝑟 𝑜 \hat{r}\Rightarrow o signify that the declarative rule r ^ ^ 𝑟 \hat{r} generates operation o 𝑜 o , and Δ ​ ( o ^ , o ) Δ ^ 𝑜 𝑜 \Delta(\hat{o},o) represents a measure of dissimilarity between operations o 𝑜 o and o ^ ^ 𝑜 \hat{o} . The above objective is similar to that of latent structured SVM. For each operation o 𝑜 o in the solution expression y i subscript 𝑦 𝑖 y_{i} , the objective tries to minimize the difference between the highest scoring rule from its concept k o superscript 𝑘 𝑜 k^{o} , and highest scoring rule from k o superscript 𝑘 𝑜 k^{o} which explains or generates the operation o 𝑜 o .

Next we fix the parameters of w r subscript 𝑤 𝑟 w_{r} , and solve:

This is equivalent to a standard structured SVM objective. We use a 0 − 1 0 1 0-1 loss for Δ ​ ( o ^ , o ) Δ ^ 𝑜 𝑜 \Delta(\hat{o},o) . Note that fixing the parameters of w r subscript 𝑤 𝑟 w_{r} determines the scores for rule selection, removing the need for any latent variables at this stage.

4.3 Inference

Given an input word problem x 𝑥 x , inferring the best math expression involves computing arg ⁡ max y ∈ 𝒴 ⁡ Score ⁡ ( x , y ) subscript 𝑦 𝒴 Score 𝑥 𝑦 \arg\max_{y\in\mathcal{Y}}\operatorname{\textsc{Score}}(x,y) , where 𝒴 𝒴 \mathcal{Y} is the set of all math expressions that can be created by combining the numbers in x 𝑥 x with basic math operations.

The size of 𝒴 𝒴 \mathcal{Y} is exponential in the number of quantities mentioned in x 𝑥 x . As a result, we perform approximate inference using beam search. We initialize the beam with the set E 𝐸 E of all numbers mentioned in the problem x 𝑥 x . At each step of the beam search, we choose two numbers (or sub-expressions) e 1 subscript 𝑒 1 e_{1} and e 2 subscript 𝑒 2 e_{2} from E 𝐸 E , and then select a math concept and a declarative rule to infer an operation o 𝑜 o . We create a new sub-expression e 3 subscript 𝑒 3 e_{3} by combining the sub-expressions e 1 subscript 𝑒 1 e_{1} and e 2 subscript 𝑒 2 e_{2} with operation o 𝑜 o . We finally create a new set E ′ superscript 𝐸 ′ E^{\prime} from E 𝐸 E , by removing e 1 subscript 𝑒 1 e_{1} and e 2 subscript 𝑒 2 e_{2} from it, and adding e 3 subscript 𝑒 3 e_{3} to it. We remove E 𝐸 E from the beam, and add all such modified sets E ′ superscript 𝐸 ′ E^{\prime} to the beam. We continue this process until all sets in the beam have only one element in them. We choose the highest scoring expression among these elements as the solution expression.

5 Model and Implementation Details

5.1 supervision.

Each word problem in our dataset is annotated with the solution math expression, along with alignment of numbers from the problem to the solution expression. In addition, we also have annotations for the numbers which possess a rate component. An example is shown in Fig 2 .

This is the same level of supervision used in [ Roy and Roth, 2017 ] . Many of the annotations can be extracted semi-automatically. The number list is extracted automatically by a number detector, the alignments require human supervision only when the same numeric value is mentioned multiple times in the problem. Most of the rate component annotations can also be extracted automatically, see [ Roy and Roth, 2017 ] for details.

We apply a few heuristics to obtain noisy annotations for the math concepts for operations. Consider the case for combining two numbers n ​ u ​ m ​ 1 𝑛 𝑢 𝑚 1 num1 and n ​ u ​ m ​ 2 𝑛 𝑢 𝑚 2 num2 , by operation o 𝑜 o . We apply the following rules:

If we detect an explicit math pattern in the neighborhood of n ​ u ​ m ​ 1 𝑛 𝑢 𝑚 1 num1 or n ​ u ​ m ​ 2 𝑛 𝑢 𝑚 2 num2 , we assign concept k o superscript 𝑘 𝑜 k^{o} to be Explicit Math.

If o 𝑜 o is multiplication or division, and one of n ​ u ​ m ​ 1 𝑛 𝑢 𝑚 1 num1 or n ​ u ​ m ​ 2 𝑛 𝑢 𝑚 2 num2 has a rate component, we assign k o superscript 𝑘 𝑜 k^{o} to be Dimensional Analysis.

If o 𝑜 o is addition or subtraction, we check if the dependent verb of both numbers are identical. If they are, we assign k o superscript 𝑘 𝑜 k^{o} to be Part-Whole relationship, otherwise, we assign it to be Transfer. We extract the dependent verb using the Stanford dependency parser [ Chen and Manning, 2014 ] .

The annotations obtained via these rules are of course not perfect. We could not detect certain uncommon rate patterns like “dividing the cost 4 4 4 ways”, and “I read same number of books 4 4 4 days running”. There were part-whole relationships exhibited with complementary verbs, as in “I won 4 4 4 games, and lost 3 3 3 .”. Both these cases lead to noisy math concept annotations.

However, we tested a small sample of these annotations, and found less than 5 % percent 5 5\% of them to be wrong. As a result, we assume these annotations to be correct in our problem formulation.

5.2 Features

We use dependency parse labels and a small set of rules to extract subject, indirect object, dependent verb, unit and rate component of each number mentioned in the problem. Details of these extractions can be found in the released codebase. Using these extractions, we define two feature functions ϕ k ​ ( x , k o ) subscript italic-ϕ 𝑘 𝑥 superscript 𝑘 𝑜 \phi_{k}(x,k^{o}) and ϕ r ​ ( x , r o ) subscript italic-ϕ 𝑟 𝑥 superscript 𝑟 𝑜 \phi_{r}(x,r^{o}) , where x 𝑥 x is the input word problem, and k o superscript 𝑘 𝑜 k^{o} and r o superscript 𝑟 𝑜 r^{o} are the concept and the declarative rule for operation o 𝑜 o respectively. ϕ r ​ ( x , r o ) subscript italic-ϕ 𝑟 𝑥 superscript 𝑟 𝑜 \phi_{r}(x,r^{o}) constitutes the following features:

If r o superscript 𝑟 𝑜 r^{o} contains Coref ⁡ ( ⋅ ) Coref ⋅ \operatorname{\mathrm{Coref}}(\cdot) function, we add features related to similarity of the arguments of Coref ⁡ ( ⋅ ) Coref ⋅ \operatorname{\mathrm{Coref}}(\cdot) (jaccard similarity score and presence of pronoun in one of the arguments).

For part-whole relationships, we add indicators for a list of words like “remaining”, “rest”, “either”, “overall”, “total”, conjoined with the part-whole function in r o superscript 𝑟 𝑜 r^{o} (Hyponymy, Hypernymy, Sibling).

Unigrams from the neighborhood of numbers being combined.

Finally, ϕ k ​ ( x , k o ) subscript italic-ϕ 𝑘 𝑥 superscript 𝑘 𝑜 \phi_{k}(x,k^{o}) generates the following features:

If k o superscript 𝑘 𝑜 k^{o} is related to dimensional analysis, we add features indicating the presence of a rate component in the combining numbers.

If k o superscript 𝑘 𝑜 k^{o} is part-whole, we add features indicating whether the verbs of combining numbers are identical.

Note that these features capture several interpretable functions like coreference, hyponymy, etc.

We do not learn three components of our system – verb classification for transfer knowledge, categorization of explicit math terms, and irrelevant number detection. For verb classification, we use a seed list of around 10 10 10 verbs for each category. Given a new verb v 𝑣 v , we choose the most similar verb v ′ superscript 𝑣 ′ v^{\prime} from the seed lists according to Glove vector [ Pennington et al., 2014 ] based similarity . We assign v 𝑣 v the category of v ′ superscript 𝑣 ′ v^{\prime} . This can be replaced by a learned component [ Hosseini et al., 2014 ] . However we found seed list based categorization to work well in most cases. For explicit math, we check for a small list of patterns to detect and categorize math terms. Note that for both the cases above, we still have to learn Coref ⁡ ( ⋅ ) Coref ⋅ \operatorname{\mathrm{Coref}}(\cdot) function to determine the final operation. Finally, to detect irrelevant numbers (numbers which are not used in the solution), we use a set of rules based on the units of numbers. Again, this can be replaced by a learned model [ Roy and Roth, 2015 ] .

6 Experiments

6.1 results on existing dataset.

We first evaluate our approach on the existing datasets of AllArith, AllArithLex, and AllArithTmpl [ Roy and Roth, 2017 ] . AllArithLex and AllArithTmpl are subsets of the AllArith dataset, created to test the robustness to new vocabulary, and new equation forms respectively. We compare to the top performing systems for arithmetic word problems. They are as follows:

Template Template \operatorname{\textsc{Template}} : Template based algebra word problem solver of [ Kushman et al., 2014 ] .

LCA++ : System of [ Roy and Roth, 2015 ] based on lowest common ancestors of math expression trees.

UnitDep UnitDep \operatorname{\textsc{UnitDep}} : Unit dependency graph based solver of [ Roy and Roth, 2017 ] .

We refer to our approach as Knowledge Knowledge \operatorname{\textsc{Knowledge}} . For all solvers, we use the system released by the respective authors. The system of Template Template \operatorname{\textsc{Template}} expects an equation as the answer, whereas our dataset contains only math expressions. We converted expressions to equations by introducing a single variable, and assigning the math expression to it. For example, an expression “(2 + 3)” gets converted to “X = (2 + 3)”.

The first few columns of Table 3 shows the performance of the systems on the aforementioned datasets 1 1 1 Results on the AllArith datasets are slightly different from [ Roy and Roth, 2017 ] , since we fixed several ungrammatical sentences in the dataset . The performance of Knowledge Knowledge \operatorname{\textsc{Knowledge}} is on par or lower than some of the existing systems. We analyzed the systems, and found most of them to be not robust to perturbations of the problem text; Table 4 shows a few examples. We further analyzed the datasets, and identified several biases in the problems (in both train and test). Systems which remember these biases get an undue advantage in evaluation. For example, the verb “give” only appears with subtraction, and hence the models are learning an erroneous correlation of “give” with subtraction. Since the test also exhibit the same bias, these systems get all the “give”-related questions correct. However, they fail to solve the problem in Table 4 , where “give” results in addition.

We also tested Knowledge Knowledge \operatorname{\textsc{Knowledge}} on the addition subtraction problems dataset released by [ Hosseini et al., 2014 ] . It achieved a cross validation accuracy of 77.19 % percent 77.19 77.19\% , which is competitive with the state of the art accuracy of 78 % percent 78 78\% achieved with the same level of supervision. The system of [ Mitra and Baral, 2016 ] achieved 86.07 % percent 86.07 86.07\% accuracy on this dataset, but requires rich annotations for formulas and alignment of numbers to formulas.

6.2 New Dataset Creation

In order to remove the aforementioned biases from the dataset, we augment it with new word problems collected via a crowdsourcing platform. These new word problems are created by perturbing the original problems minimally, such that the answer is different from the original problem. For each word problem p 𝑝 p with an answer expression a 𝑎 a in our original dataset AllArith, we replace one operation in a 𝑎 a to create a new math expression a ′ superscript 𝑎 ′ a^{\prime} . We ask annotators to modify problem p 𝑝 p minimally, such that a ′ superscript 𝑎 ′ a^{\prime} is now the solution to the modified word problem.

We create a ′ superscript 𝑎 ′ a^{\prime} from a 𝑎 a either by replacing an addition with subtraction or vice versa, or by replacing multiplication with division or vice versa. We do not replace addition and subtraction with multiplication or division, since there might not be an easy perturbation that supports this conversion. We only allowed perturbed expressions which evaluate to values greater than 1 1 1 . For example, we generate the expression “(3+2)” from “(3-2)”, we generated expressions “(10+2)/4” and “(10-2)*4” for the expression “(10-2)/4”. We generate all possible perturbed expressions for a given answer expression, and ask for problem text modification for each one of them.

We show the annotators the original problem text p 𝑝 p paired with a perturbed answer a ′ superscript 𝑎 ′ a^{\prime} . The instructions advised them to copy over the given problem text, and modify it as little as possible so that the given math expression is now the solution to this modified problem. They were also instructed to not add or delete the numbers mentioned in the problem. If the original problem mentions two “3”s and one “2”, the modified problem should also contain two “3”s and one “2”.

We manually pruned problems which did not yield the desired solution a ′ superscript 𝑎 ′ a^{\prime} , or were too different from the input problem p 𝑝 p . This procedure gave us a set of 661 661 661 new word problems, which we refer to as Perturb . Finally we augment AllArith with the problems of Perturb , and call this new dataset Aggregate . Aggregate has a total of 1492 1492 1492 problems.

The addition of the Perturb problems ensures that the dataset now has problems with similar lexical items generating different answers. This minimizes the bias that we discussed in subsection 6.1 . To quantify this, consider the probability distribution over operations for a quantity q 𝑞 q , given that word w 𝑤 w is present in the neighborhood of q 𝑞 q . For an unbiased dataset, you will expect the entropy of this distribution to be high, since the presence of a single word in a number neighborhood will seldom be completely informative for the operation. We compute the average of this entropy value over all numbers and neighborhood words in our dataset. AllArith and Perturb have an average entropy of 0.34 0.34 0.34 and 0.32 0.32 0.32 respectively, whereas Aggregate’s average entropy is 0.54 0.54 0.54 , indicating that, indeed, the complete data set is significantly less biased.

6.3 Generalization from Biased Dataset

First, we evaluate the ability of systems to generalize from biased datasets. We train all systems on AllArith, and test them on Perturb (which was created by perturbing AllArith problems). The last column of Table 3 shows the performance of systems in this setting. Knowledge Knowledge \operatorname{\textsc{Knowledge}} outperforms all other systems in this setting with around 19 % percent 19 19\% absolute improvement over UnitDep UnitDep \operatorname{\textsc{UnitDep}} . This shows that declarative knowledge allows the system to learn the correct abstractions, even from biased datasets.

6.4 Results on the New Dataset

Finally, we evaluate the systems on the Aggregate dataset. Following previous work [ Roy and Roth, 2017 ] , we compute two subsets of Aggregate comprising 756 756 756 problems each, using the MAWPS [ Koncel-Kedziorski et al., 2016 ] system. The first, called AggregateLex, is one with low lexical repetitions, and the second called AggregateTmpl is one with low repetitions of equation forms. We also evaluate on these two subsets on a 5-fold cross-valiation. Columns 4-6 of Table 3 show the performance of systems on this setting. Knowledge Knowledge \operatorname{\textsc{Knowledge}} significantly outperforms other systems on Aggregate and AggregateLex, and is similar to UnitDep UnitDep \operatorname{\textsc{UnitDep}} on AggregateTmpl. There is a 9 % percent 9 9\% absolute improvement on AggregateLex, showing that Knowledge Knowledge \operatorname{\textsc{Knowledge}} is significantly more robust to low lexical overlap between train and test. The last column of Table 4 also shows that the other systems do not learn the right abstraction, even when trained on Aggregate.

6.5 Analysis

Coverage of the Declarative Rules We chose math concepts and declarative rules based on their prevalance in arithmetic word problems. We found that the four concepts introduced in this paper cover almost all the problems in our dataset; only missing 4 4 4 problems involving application of area formulas. We also checked earlier arithmetic problem datasets from the works of [ Hosseini et al., 2014 , Roy and Roth, 2015 ] , and found that the math concepts and declarative rules introduced in this paper cover all their problems.

A major challenge in applying these concepts and rules to algebra word problems is the use of variables in constructing equations. Variables are often implicitly described, and it is difficult to extract units, dependent verbs, associated subjects and objects for the variables. However, we need these extractions in order to apply our declarative rules to combine variables. There has been some work to extract meaning of variables [ Roy et al., 2016 ] in algebra word problems; an extension of this can possibly support the application of rules in algebra word problems. We leave this exploration to future work.

Higher standard word problems often require application of math formulas like ones related to area, interest, probability, etc. Extending our approach to handle such problems will involve encoding math formulas in terms of concepts and rules, as well as adding concept specific features to the learned predictors. The declarative rules under the Explicit Math category currently handles simple cases, this set needs to be augmented to handle complex number word problems found in algebra datasets.

Gains achieved by Declarative Rules Table 5 shows examples of problems which Knowledge Knowledge \operatorname{\textsc{Knowledge}} gets right, but UnitDep UnitDep \operatorname{\textsc{UnitDep}} does not. The gains can be attributed to the injection of declarative knowledge. Earlier systems like UnitDep UnitDep \operatorname{\textsc{UnitDep}} try to learn the reasoning required for these problems from the data alone. This is often difficult in the presence of limited data, and noisy output from NLP tools. In contrast, we learn probabilistic models for interpretable functions like coreference, hyponymy, etc., and then use declarative knowledge involving these functions to perform reasoning. This considerably reduces the complexity of the target function to be learnt, and hence we end up with a more robust model.

Effect of Beam Size We used a beam size of 1000 1000 1000 in all our experiments. However, we found that varying the beam size does not effect the performance significantly. Even lowering the beam size to 100 100 100 reduced performance by only 1 % percent 1 1\% .

Weakness of Approach A weakness of our method is the requirement to have all relevant declarative knowledge during training. Many of the component functions (like coreference) are learnt through latent alignments with no explicit annotations. If too many problems are not explained by the knowledge, the model will learn noisy alignments for the component functions.

Table 6 shows the major categories of errors with examples. 26 % percent 26 26\% of the errors are due to extraneous number detection. We use a set of rules based on units of numbers, to detect such irrelevant numbers. As a result, we fail to detect numbers which are irrelevant due to other factors, like associated entities, or associated verb. We can potentially expand our rule based system to detect those, or replace it by a learned module like [ Roy and Roth, 2015 ] . Another major source of errors is parsing of rate components, that is, understanding “earns $46 cleaning a home” should be normalized to “46$ per home”. Although we learn a model for coreference function, we make several mistakes related to coreference. For the example in Table 6 , we fail to detect the coreference between “team member” and “people”.

7 Conclusion

In this paper, we introduce a framework for incorporating declarative knowledge in word problem solving. Our knowledge based approach outperforms all other systems, and also learns better abstractions from biased datasets. Given that the variability in text is much larger than the number of declarative rules that governs Math word problems, we believe that this is a good way to introduce Math knowledge to a natural language understanding system. Consequently, future work will involve extending our approach to handle a wider range of word problems, possibly by supporting better grounding of implicit variables and including a larger number of math concepts and declarative rules. An orthogonal exploration direction is to apply these techniques to generate summaries of financial or sports news, or generate statistics of war or gun violence deaths from news corpora. A straightforward approach can be to augment news documents with a question asking for the required information, and treating this augmented news document as a math word problem.

Code and dataset are available at https://github.com/CogComp/arithmetic .

Acknowledgments

This work is funded by DARPA under agreement number FA8750-13-2-0008, and a grant from the Allen Institute for Artificial Intelligence (allenai.org).

  • [Bisk and Hockenmaier, 2012] Yonatan Bisk and Julia Hockenmaier. 2012. Simple Robust Grammar Induction with Combinatory Categorial Grammars. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12) , pages 1643–1649, Toronto, Canada, July.
  • [Chang et al., 2012] Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2012. Structured learning with constrained conditional models. Machine Learning , 88(3):399–431, 6.
  • [Chen and Manning, 2014] Danqi Chen and Christopher D Manning. 2014. A fast and accurate dependency parser using neural networks. In Empirical Methods in Natural Language Processing (EMNLP) .
  • [Clarke et al., 2010] James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. Driving semantic parsing from the world’s response. In Proc. of the Conference on Computational Natural Language Learning (CoNLL) , 7.
  • [DeJong, 1993] Gerald DeJong. 1993. Investigating explanation-based learning . Kluwer international series in engineering and computer science. Kluwer Academic Publishers.
  • [DeJong, 2014] Gerald DeJong. 2014. Explanation-based learning. In T. Gonzalez, J. Diaz-Herrera, and A. Tucker, editors, CRC computing handbook: Computer science and software engineering , pages 66.1–66.26. CRC Press, Boca Raton.
  • [Ganchev et al., 2010] Kuzman Ganchev, Joao Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research .
  • [Gimpel and Bansal, 2014] Kevin Gimpel and Mohit Bansal. 2014. Weakly-supervised learning with cost-augmented contrastive estimation. In Proc. of EMNLP .
  • [Hopkins et al., 2017] Mark Hopkins, Cristian Petrescu-Prahova, Roie Levin, Ronan Le Bras, Alvaro Herrasti, and Vidur Joshi. 2017. Beyond sentential semantic parsing: Tackling the math sat with a cascade of tree transducers. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 806–815, Copenhagen, Denmark, September. Association for Computational Linguistics.
  • [Hosseini et al., 2014] Mohammad Javad Hosseini, Hannaneh Hajishirzi, Oren Etzioni, and Nate Kushman. 2014. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the Conference on Empirical Methods for Natural Language Processing (EMNLP) .
  • [Huang et al., 2017] Danqing Huang, Shuming Shi, Chin-Yew Lin, and Jian Yin. 2017. Learning fine-grained expressions to solve math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 816–825, Copenhagen, Denmark, September. Association for Computational Linguistics.
  • [Jia and Liang, 2017] Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 2021–2031. Association for Computational Linguistics, September.
  • [Khashabi et al., 2016] Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, and Dan Roth. 2016. Question answering via integer programming over semi-structured knowledge. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI) .
  • [Koncel-Kedziorski et al., 2015] Rik Koncel-Kedziorski, Hannaneh Hajishirzi, Ashish Sabharwal, Oren Etzioni, and Siena Ang. 2015. Parsing Algebraic Word Problems into Equations. TACL .
  • [Koncel-Kedziorski et al., 2016] Rik Koncel-Kedziorski, Subhro Roy, Aida Amini, Nate Kushman, and Hannaneh Hajishirzi. 2016. Mawps: A math word problem repository. In NAACL .
  • [Kushman et al., 2014] Nate Kushman, Luke Zettlemoyer, Regina Barzilay, and Yoav Artzi. 2014. Learning to automatically solve algebra word problems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) , pages 271–281.
  • [Liang et al., 2011] Percy Liang, Michael Jordan, and Dan Klein. 2011. Learning dependency-based compositional semantics. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) .
  • [Ling et al., 2017] Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. 2017. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In ACL .
  • [Matsuzaki et al., 2017] Takuya Matsuzaki, Takumi Ito, Hidenao Iwane, Hirokazu Anai, and Noriko H. Arai. 2017. Semantic parsing of pre-university math problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 2131–2141, Vancouver, Canada, July. Association for Computational Linguistics.
  • [Mitra and Baral, 2016] Arindam Mitra and Chitta Baral. 2016. Learning to use formulas to solve simple arithmetic problems. In ACL .
  • [Naseem et al., 2010] Tahira Naseem, Harr Chen, Regina Barzilay, and Mark Johnson. 2010. Using universal linguistic knowledge to guide grammar induction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing , EMNLP ’10, pages 1234–1244, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • [Pennington et al., 2014] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proc. of EMNLP .
  • [Roth and tau Yih, 2004] Dan Roth and Wen tau Yih. 2004. A linear programming formulation for global inference in natural language tasks. In Hwee Tou Ng and Ellen Riloff, editors, Proc. of the Conference on Computational Natural Language Learning (CoNLL) , pages 1–8. Association for Computational Linguistics.
  • [Roy and Roth, 2015] Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP) .
  • [Roy and Roth, 2017] Subhro Roy and Dan Roth. 2017. Unit dependency graph and its application to arithmetic word problem solving. In Proc. of the Conference on Artificial Intelligence (AAAI) .
  • [Roy et al., 2016] Subhro Roy, Shyam Upadhyay, and Dan Roth. 2016. Equation parsing : Mapping sentences to grounded equations. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP) .
  • [Shi et al., 2015] Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu, and Yong Rui. 2015. Automatically solving number word problems by semantic parsing and reasoning. In EMNLP .
  • [Smith and Eisner, 2006] Noah Smith and Jason Eisner. 2006. Annealing structural bias in multilingual weighted grammar induction. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) , ACL-44, pages 569–576, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • [Upadhyay et al., 2016] Shyam Upadhyay, Ming-Wei Chang, Kai-Wei Chang, and Wen-tau Yih. 2016. Learning from explicit and implicit supervision jointly for algebra word problems. In EMNLP .
  • [wei Chang et al., 2007] Ming wei Chang, Lev Ratinov, and Dan Roth. 2007. Guiding semi-supervision with constraint-driven learning. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL) , pages 280–287, Prague, Czech Republic, 6. Association for Computational Linguistics.
  • [Zhou et al., 2015] Lipu Zhou, Shuaixiang Dai, and Liwei Chen. 2015. Learn to solve algebra word problems using quadratic programming. In EMNLP .

ar5iv homepage

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Arithmetic word problem solver

CogComp/arithmetic

Folders and files, repository files navigation, arithmetic word problem solver.

This repository contains the source code and data for automatic math word problem solving. It contains the implementation of 3 distinct arithmetic word problem solvers. This has been developed by Subhro Roy at the Cognitive Computation Group at the University of Illinois, Urbana Champaign.

Publication

This repository has been used in the following publications. Please cite them if you use the code or data.

Data can be found in the file data/questions.json. This file constitutes our largest dataset Aggregate. All other released datasets, namely allArith, allArithLex, allArithTmpl, aggregateLex, and aggregateTmpl, are subsets of Aggregate.

The data file has a json format. Each datapoint is represented as

"iIndex" is a unique integer identifier, "sQuestion" is the problem text, quants is the ordered list of numbers detected from the text, "lAlignments" refer to the alignment of numbers from the text to the equation (if i-th position has j, then the i-th number in the equation is mapped to the j-th number from the quants), "lEquations" is the list of equations, and lSolutions is a list of solutions. "rates" indicate the indices of quantities which represent a rate relationship. If the question represents a rate, "rate" will have -1.

The data/ directory has several folders, each representing a dataset. A dataset directory usually has a few fold files, each fold file contains indices of the problems belonging to the fold. The dataset comprises the union of all the folds present in the folder.

Instructions to run the code

You will need to have maven installed in your system. To download the dependencies, run

Next compile using :

The main interface to the code is the script run.sh. It has the following arguments:

  • --data <data_file> : Data file containing all the questions (uses data/questions.json if not provided)
  • Rel : Trains and tests the relevance classifier. Output file log/Rel.out
  • Pair : Trains and tests the LCA classifier. Output file log/Pair.out
  • Vertex : Trains and tests the vertex label classifier. Output file log/Vertex.out
  • Edge : Trains and tests the edge label classifier. Output file log/Edge.out
  • GraphDecompose : Trains and tests the decomposed UDG prediction model. Output file log/GraphDecompose.out
  • GraphJoint : Trains and tests the joint UDG prediction model. Output file log/GraphJoint.out
  • LCA : Trains and tests the LCA++ solver (EMNLP 2015). Output file log/LCA.out
  • UnitDep : Trains and tests the UnitDep solver (AAAI 2017). Output file log/UnitDep.out
  • Coref : Trains and tests the coreference model or the rule selection model. Output file log/Coref.out
  • Logic : Trains and tests the logic model with gold rule selection and relevance. Output file log/Logic.out
  • E2ELogic : Trains and tests the end to end logic system. Output file log/E2ELogic.out
  • --train <train_fold_1> <train_fold_2> ... <train_fold_n> : List of fold files for training.
  • --test <test_fold_1> <test_fold_2> ... <test_fold_n> : List of fold files for testing.
  • --cv <cv_fold_1> <cv_fold_2> ... <cv_fold_n> : List of fold files for n-fold cross validation. Will not work when train and test are provided.
  • --model_dir < dir > : Location to save model files (uses models/ if not provided).
  • --print_mistakes : Option to print test questions the system got wrong.
  • --print_correct : Option to print test questions the system got correct.
  • --demo : Option to run terminal based demo of the system.
  • --demo_server : Starts the web demo, only for Cogcomp users.

Have a look at some of the scripts under scripts/ for examples of commands used to generates results in the papers.

Other issues

Please send any suggestions, comments, issues to [email protected] .

Contributors 4

Advertisement

Issue Cover

  • Previous Article
  • Next Article

Parsing Algebraic Word Problems into Equations

  • Cite Icon Cite
  • Open the PDF for in another window
  • Permissions
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Search Site

Rik Koncel-Kedziorski , Hannaneh Hajishirzi , Ashish Sabharwal , Oren Etzioni , Siena Dumas Ang; Parsing Algebraic Word Problems into Equations. Transactions of the Association for Computational Linguistics 2015; 3 585–597. doi: https://doi.org/10.1162/tacl_a_00160

Download citation file:

  • Ris (Zotero)
  • Reference Manager

This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees and score their likelihood by learning local and global discriminative models. These models are trained on a small set of word problems and their answers, without any manual annotation, in order to choose the equation that best matches the problem text. We refer to the overall system as A lges .

We compare A lges with previous work and show that it covers the full gamut of arithmetic operations whereas Hosseini et al. (2014) only handle addition and subtraction. In addition, A lges overcomes the brittleness of the Kushman et al. (2014) approach on single-equation problems, yielding a 15% to 50% reduction in error.

Email alerts

Related articles, related book chapters, affiliations.

  • Online ISSN 2307-387X

A product of The MIT Press

Mit press direct.

  • About MIT Press Direct

Information

  • Accessibility
  • For Authors
  • For Customers
  • For Librarians
  • Direct to Open
  • Open Access
  • Media Inquiries
  • Rights and Permissions
  • For Advertisers
  • About the MIT Press
  • The MIT Press Reader
  • MIT Press Blog
  • Seasonal Catalogs
  • MIT Press Home
  • Give to the MIT Press
  • Direct Service Desk
  • Terms of Use
  • Privacy Statement
  • Crossref Member
  • COUNTER Member  
  • The MIT Press colophon is registered in the U.S. Patent and Trademark Office

This Feature Is Available To Subscribers Only

Sign In or Create an Account

IMAGES

  1. Mapping to Declarative Knowledge for Word Problem Solving

    mapping to declarative knowledge for word problem solving

  2. 5 Unique Word Problem Solving Strategies That Get Results

    mapping to declarative knowledge for word problem solving

  3. Declarative Knowledge Definition, Uses, Types, and Examples

    mapping to declarative knowledge for word problem solving

  4. Solving Word Problems Chart Grade 2-8

    mapping to declarative knowledge for word problem solving

  5. Mapping to Declarative Knowledge for Word Problem Solving

    mapping to declarative knowledge for word problem solving

  6. 3 Declarative knowledge template

    mapping to declarative knowledge for word problem solving

VIDEO

  1. PROCEDURAL & DECLARATIVE KNOWLEDGE (MATERI PPPK GURU 2023)

  2. AI Summerschool 2023 by CAIML & ASAI Day 4 06.07.2023

  3. Answer set solving in practice, motivation, declarative problem solving (HD)

  4. Procedural Vs Declarative Knowledge by Dr. M Nagaraju

  5. Representing knowledge using rules in AI#Ai #viral #representing_knowledge_using_rules_in_AI

  6. PROCEDURAL and DECLARATIVE KNOWLEDGE

COMMENTS

  1. Mapping to Declarative Knowledge for Word Problem Solving

    We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem ...

  2. Mapping to Declarative Knowledge for Word Problem Solving

    Abstract. Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding of several mathematical concepts such as dimensional analysis, subset relationships, etc. In this paper, we develop declarative rules which govern the translation of ...

  3. PDF Mapping to Declarative Knowledge for Word Problem Solving

    corporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the rele-vant declarative knowledge for each opera-tion of the solution expression. This pro-vides a way to handle multiple concepts in the same problem while, at the same time, sup-

  4. Mapping to Declarative Knowledge for Word Problem Solving

    Declarative rules which govern the translation of natural language description of these concepts to math expressions are developed, and a framework for incorporating such declarative knowledge into word problem solving is presented. Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties ...

  5. Mapping to Declarative Knowledge for Word Problem Solving

    Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a ...

  6. Mapping to Declarative Knowledge for Word Problem Solving

    Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, support interpretability of the answer expression. Our method models the ...

  7. Mapping to Declarative Knowledge for Word Problem Solving

    Transactions of the Association for Computational Linguistics. Current; Archives; Announcements; About. About the Journal Submissions

  8. ACL Anthology

    We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem ...

  9. PDF Mapping to Declarative Knowledge for Word Problem Solving

    Our method learns to map arithmetic word problem text to math ex- pressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same prob- lem while, at the same time, supporting in- terpretability of the answer expression.

  10. Solving age-word problems using domain ontology and BERT

    Unit Dependency Graph and Its Application to Arithmetic Word Problem Solving. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI'17). AAAI Press, 3082-3088. Google Scholar Digital Library; Subhro Roy and Dan Roth. 2018. Mapping to Declarative Knowledge for Word Problem Solving.

  11. Solving Math Word Problem with External Knowledge and ...

    Automatic math word problem(MWP) solving is an interesting task for NLP researchers in recent years. ... Roy, S., Roth, D.: Mapping to declarative knowledge for word problem solving. Trans. Assoc. Comput. Linguist. 6, 159-172 (2018) Article Google Scholar Lewis, A.B.: Training students to represent arithmetic word problems. 81(4), 521 (1989 ...

  12. Mapping to Declarative Knowledge for Word Problem Solving

    Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, supporting interpretability of the answer expression.

  13. Problem-guided Neural Math Problem Solvers

    The Math Word Problem (MWP) refers to mathematical problems described in natural language. Recent research has mostly employed sequence-to-sequence (Seq2Seq) or sequence-to-tree (Seq2Tree) approaches to generate computational trees or expressions that help solve the problems. ... "Mapping to Declarative Knowledge for Word Problem Solving ...

  14. Mapping to Declarative Knowledge for Word Problem Solving (TACL)

    This is "Mapping to Declarative Knowledge for Word Problem Solving (TACL)" by ACL on Vimeo, the home for high quality videos and the people who love them.

  15. PDF arXiv:2205.08232v3 [cs.AI] 24 Oct 2022

    matical knowledge needed to solve the math word problems. 2.3 Interpretability of MWP Solvers Although the prior statistical models with hand-crafted features can be thought of as interpretable due to the clear alignments between inputs and out-puts, recently proposed deep learning approaches present new challenges to model interpretability

  16. A Goal-Driven Tree-Structured Neural Model for Math Word Problems

    Declarative rules which govern the translation of natural language description of these concepts to math expressions are developed, and a framework for incorporating such declarative knowledge into word problem solving is presented.

  17. Mapping to Declarative Knowledge for Word Problem Solving

    Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding…

  18. GitHub

    Arithmetic Word Problem Solver. This repository contains the source code and data for automatic math word problem solving. It contains the implementation of 3 distinct arithmetic word problem solvers. This has been developed by Subhro Roy at the Cognitive Computation Group at the University of Illinois, Urbana Champaign.

  19. 4

    Three well-known problem solving tasks. A well-known methodology for declarative problem solving is the generate and test methodology whereby possible solutions to the problem are generated and non-solutions are eliminated by testing. This is similar to the common way of showing that a problem is in the class NP, where it is shown that after ...

  20. Parsing Algebraic Word Problems into Equations

    Abstract. This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees. We use integer linear programming to generate equation trees and score their likelihood by learning local and global discriminative models. These models are trained on a small set of word problems and their answers, without any manual annotation, in ...