William C. Bulko
IBM Corporation
11400 Burnett Road
Austin, Texas 78758
Copyright © 1992 by AAAI.
This article appears in 1992 AAAI Spring Symposium on Reasoning with Diagrammatic Representations,Stanford, CA, March 25-27, 1992 (AAAI Press).
This research was supported in part by the U.S. Army Research Office under contract DAAG29-84-K-0060. Computer equipment used in this research was donated by Hewlett Packard and Xerox Corporation.
Solving physics problems usually requires geometric reasoning; a computer problem solver must use a representation that is in some respects equivalent to the use of diagrams by human problem solvers. In this paper, we review the uses of diagrams and geometric reasoning in physics problem solving programs. Next, we consider the many roles that diagrams seem to play in human problem solving, the relative strengths and weaknesses of computers and human problem solvers for reasoning with diagram-like representations, and ways in which some of the uses of diagrams by humans might be implemented in computer programs.
The ISAAC program [novak:isaac,novak:ijcai] solves physics problems stated in English, in the area of rigid body statics. Figure 1 is the most geometrically complex of the 40 problems solved by ISAAC. ISAAC uses only English text as input; the diagram is produced from its understanding of the English and from calculated values. A geometric model, similar to the diagram, is used in problem solving; in the geometric model most objects are reduced to lines and points.
The program contains a geometric model for each type of object. This model gives the dimensions of a bounding box containing the object together with coordinates of named points on the object; there is also a program to draw a picture of the object corresponding to these sizes and coordinates. By scaling the geometric model in two dimensions by appropriate size factors, rotating it, and translating it to the proper position, the geometry of an instance of the object type is obtained. Of course, the scaling, rotation, and translation are done by vector operations on points, rather than by manipulation of an image.
Rigid body statics problems generally involve analysis of forces on a single ``main'' object; in any case, all of the objects in a given problem are attached to each other. This permits a simple algorithm to construct a model of a situation involving multiple objects. A single object is chosen and assigned the coordinates (0 0); the objects that are attached to it are then scaled to the appropriate size, rotated by the appropriate angle, and then translated so that the point of attachment has the same coordinates as the point in the composite model to which it is attached. Objects attached to the newly added objects are then added in the same manner until all objects are positioned in the composite model.
The foot of a ladder rests against a vertical wall and on a horizontal floor. The top of the ladder is supported from the wall by a horizontal rope 30 ft long. The ladder is 50 ft long, weighs 100 lb with its center of gravity 20 ft from the foot, and a 150 lb man is 10 ft from the top. Determine the tension in the rope.
Fig. 1: ISAAC Problem (Schaum p. 25, no. 19)
The above algorithm is sufficient provided that the attachments of objects form a tree structure, which is the case for most of our example problems. In the case of the ladder problem shown in Figure 1, however, a triangle must be solved in order to obtain the sizes and rotations of the objects that form the triangle. The triangle is detected by an ad hoc program that looks for triangles by checking whether an object a is attached to an object b that is attached to c, which is attached to a. If a triangle is detected, the known parameters are abstracted and given to a triangle solver, which returns the complete set of angles and sides of the triangle. The returned parameters must be translated back to the form needed for the definition of the object; in the example shown, the angle between the ladder and the vertical wall is returned, but the angle from the horizontal is needed as the rotation of the ladder.
The geometric model produced by ISAAC is entirely described by analytic geometry within a single planar coordinate system. Some lengths in the geometric model may be symbolic variables rather than numeric values; the solution of the equations produced for the physics problem in general produces values for these variables, so that numeric values will be available when the diagram is drawn. In general, however, this form of geometric model has weaknesses. Many geometric features of physics problems are unspecified in the problem statement. Although it would be possible to make a single, unified geometric model with symbolic values for the unspecified lengths, positions, and angles, to do so would complicate the algebra to such a degree that the equations would quickly become unsolvable, even with the help of symbolic algebra packages. Instead, we believe it would be better to have multiple local geometries that are locally precise regarding certain geometric features, with topological connections between the local geometries.
It is difficult to describe complex geometries in English. The BEATRIX program [bulko:diss,novak-bulko90] understands physics problems specified by a combination of English text and a diagram. In general, neither modality provides a complete description of the problem; it is necessary to produce a single, unified model that incorporates information from both. This requires that the understander establish coreference between parts of the text and diagram that refer to the same object or feature. An example of a problem understood by BEATRIX is shown in Figure 2.
Two masses are connected by a cable as shown in the figure. The strut is held in position by a cable. The incline is smooth, and the cable passes over a smooth peg. Find the tension in the cable for theta = 30 degrees and m1 = m2 = 20 kg. Neglect the weight of the strut.
Fig. 2: Test Problem A2
The input to BEATRIX is constructed by means of a user interface that allows drawing components to be selected and moved, scaled, and rotated as desired. The interface also allows entry of bits of text within the diagram, as well as entry and editing of the English problem statement. The drawing is displayed in a window as it is constructed. As a side effect, a symbolic description of the items in the diagram is constructed; this serves as input to the understanding program.
We have taken care to ensure that the diagram input consists of ``neutral'' components such as lines, circles, and rectangles -- a form of input that could reasonably be produced automatically from a printed diagram by a machine vision system [ballard-brown]. Thus, BEATRIX does not perform pixel-level analysis of diagrams, but is assumed to be one level above components, such as line detectors, that do pixel-level analysis.
Understanding of diagrams shares many of the difficulties of understanding natural language: ambiguity of the meanings of elements, ambiguity of combination of elements into groups, and underspecification. An element such as a line is ambiguous because it might represent an edge of an object, a shared boundary between two objects, or an object in itself (such as a cable). Groups of lines can often be combined in many possible ways, only a few of which are correct. Diagrams often omit things that can easily be inferred; for example, the attachment between a rope and an object that it supports is usually not represented by anything other than contact of the diagram elements. As in speech understanding [hearsay], ambiguity can be reduced by making use of various constraints:
A person reading a physics problem probably will not read all of the text, and only then look at the diagram, or vice versa; instead, the person will typically look briefly at the picture, read some text, look back at the picture, and so forth until the problem has been understood. It is unlikely that any fixed order of processing would suffice for a broad selection of problems, especially since a given problem might be specified entirely by text, entirely by a diagram, or by a combination of the two. For this reason, BEATRIX is organized to allow co-parsing of the two input modalities, using the BB1 blackboard system [bb1-manual].
The input to BEATRIX consists of an analytic-geometry description of the diagram in terms of points, lines, rectangles, and circles. Although BEATRIX does not deal with a diagram represented as pixels, it does perform some low-level analysis of details, e.g. to determine whether one line terminates perpendicularly at another line, or whether a line is approximately tangent to a circle.
In this section, we consider the various roles that a diagram may play in human problem solving. Some of these roles have previously been described in [larkin-simon:diag10k].
A central feature of human intelligence is limited short-term memory [miller:magic7]. By writing down intermediate results, a person frees limited short-term memory for other uses; writing and re-perceiving the intermediate results is much faster than memorizing them. Surely diagrams play such a role also. Consider the following problem: [This problem was taken from a precalculus midterm exam.]
It is known about a triangle ABC that AB = 1 , angle C = 90o , angle A = alpha . CD is the perpendicular from C to AB (see picture). (a) Find the angle B and the lengths of AC , BC , and CD . (b) Find alpha for which CD = 1/2 .This is not a difficult problem, but it is hard to keep track of the components and their relationships mentally if no diagram (Figure 3) is available. The problem solver often will progressively annotate the diagram with derived results, making these results immediately available by inspection when needed. Such retrieval by inspection is often opportunistic, i.e., a glance at part of the diagram shows the needed value sitting there, without explicit planning for use of that value.
In some cases, a diagram serves the function of a ``coordinate system'' or substrate, allowing the remainder of a problem to be described relative to a substrate that has been mentioned and which the reader is assumed to understand. For example,
A car leaves point A and drives north for 6 miles at 30 MPH ...Since the natural language problem statement makes definite references to geometric features of the substrate, a detailed model of the substrate geometry is required to understand such problems.
A punter located at his 40-yard line kicks a punt at an angle of 45 degrees to a receiver at the opposite 20-yard line ...
A fundamental problem faced by many A.I. systems is underspecification of a problem or situation. A diagram can help the problem solver to infer the correct context by encouraging the elaboration of elements normally associated with the diagram. A simple example of an underspecified problem is the following problem solved by ISAAC:
What force is required to lift one end of a pole?To a person, a drawing of a horizontal pole supported only by a force at one end ``looks wrong''; the exercise of drawing a free body diagram may help a human problem solver to consider all the relevant forces until the set of forces drawn on the body looks balanced. In the above problem, ISAAC assumes (by symbolic inference) that the other end of the pole is supported by a pivot; it also assumes that the pole is uniform (having its center of gravity at its geometric center) and that it has a nonzero length and weight. Physics problems often omit important geometric facts, e.g. that objects rest on the surface of the earth, or that walls are vertical planes that are bounded below by horizontal floors.
Larkin and Simon [larkin-simon:diag10k] describe ``perceptual'' inferences as the major advantage of the use of diagrams. While such inferences ( e.g. the fact that vertical angles formed by intersecting lines are equal) can be made symbolically, they can be made at almost no cost by perception.
Larkin and Simon describe perceptual inferences that are essentially identical to symbolic inferences that can be made formally. While perceptual inferences may suggest subproblems that can then be treated formally ( e.g., the perception that vertical angles appear to be equal may trigger the memory that this is indeed a theorem in geometry), we believe that many perceptual inferences are made and accepted without proof or even much thought. For example, in the problem of Figure 2 the problem solver will make the assumption that the string is parallel to the inclined plane; this is unstated and cannot be formally proved due to underspecification. Thus, we believe that many perceptual inferences are of the form ``these things are equal because they appear to be equal''.
A skilled problem solver will deliberately construct diagrams to facilitate inference by recognition. For example, in the problem of Figure 3, a skilled problem solver will draw the figure so that the angle is clearly less than 45o , thus facilitating recognition that the angle alpha' is the same as alpha. While this can be proved, the problem solver will probably not perform a proof, but will simply assume that all angles that are not 90o are either the ``small'' angle or the ``large'' angle.
Figure 3: Analytic Geometry Problem
It appears that ``perceptual'' inferences are important in related domains, even when diagrams are not used. For example, a person skilled in performing mental arithmetic can perform the mental calculation:
4   /   .97   ~=   4.12by recognizing the problem as an instance of the pattern:
1   /   (1 - epsilon)   ~=   (1 + epsilon),   where epsilon is small.The recognition that .97 is ``almost 1'' seems to be a perceptual inference that would need to be made in order to trigger a production rule for this pattern. There is evidence that experts can make large numbers of such perceptual inferences, which may be an important component of their expertise. For example, Feynman ([feynman:joking], Lucky Numbers) boasted (falsely) that he could compute exponentials in his head; but he confounded his friends' attempts to expose his deception because he was able to recognize special cases that allowed him to do every example they presented to him.
People seem to be able to perform at least the following kinds of recognition perceptually using diagrams:
Some inference rules used with diagrams seem almost to be ``plastic overlays'' that can be moved into position and added to a diagram. The right-hand rule of electromagnetic fields is such an operator, which may be invoked with actual movement of the hand. The rule that ``sine = opposite / hypotenuse'' can be thought of as a diagrammatic operator (Figure 4) that can be moved (mentally) into position and then used to add inferences directly to the diagram:
Figure 4: Sine Rule Overlay
An advantage of such diagrammatic operators is that they can be used locally by making simple mental transformations such as translation, rotation, and reflection to make the diagrammatic operator match the existing diagram. Intermediate results that are written on the diagram become available for subsequent use, so that a chain of such operations can be used. For example, in the problem of Figure 3, the sine rule can be applied to the large triangle to find that BC = sin(alpha) ; this value can then be used as input to a cosine rule for the smaller triangle sharing the side BC to find CD = sin(alpha) * cos(alpha).
We have proposed [kook:diss,kook-novak-tkde] that the analysis of a physics problem should be represented not just as a set of equations, but as sets of correspondences between problem features and physical models. In more complex problems, a single object in the problem may have multiple views in different physical models. When represented as networks, the correspondence sets become large and complex. A diagram can serve as a compact representation of such correspondences, in which a particular part of the diagram carries (in one form or other) multiple labels.
One form of such multiple-view diagrams shows the formal geometric model drawn on a picture of the actual objects, as in the following example (Figure 5) of a claw hammer drawn as a lever problem ([miller], problem 64).
Figure 5: Hammer Geometry
Diagrams are often furnished with statements of physical laws [gieck]. Such diagrams presumably facilitate retrieval of the appropriate formula from memory when a similar problem diagram is seen. In addition, the diagram facilitates matching between problem features and corresponding features of the physical model because the corresponding features appear in similar locations in each diagram. Chi et al. [chi] found that novices sorted physics problems primarily on the basis of diagrammatic similarities.
For example, consider a centrifugal force diagram, as given in [gieck], and a problem such as:
Given the gravitational constant G and the known facts about the orbit of the earth, calculate the mass of the sun.
Figure 6: Centrifugal Force Law and Planet Problem
These diagrams (Figure 6) immediately suggest the correct correspondences: the sun corresponds to the center of the circle, the earth to the mass (suggesting that the earth be ``coerced'' to a point mass), the radius r to the earth-sun distance d , and the velocity v to the velocity of the earth along its orbital path (which then becomes a subproblem to be solved).
Larkin and Simon [larkin:cogsci] proposed the representation of problems and of physical situations as directed graphs and the use of graph-matching algorithms to find and invoke appropriate physical models. This may be difficult, both because graph matching is computationally intractable and because graphs are prevented from matching by missing or extra nodes. Diagram matching may be more useful because diagrams that represent physical principles can be indexed by their ``main'' features, such as circular motion, which are likely to have only one or a few matches in a given problem. In addition, a match between a diagram and a given problem need not be exact: extra elements in the problem do not matter, and missing elements can be ignored (if the corresponding variable is not used) or taken as subproblems to be found.
An interesting approach to teaching physics problem solving might be to allow the student to select physical models for a given problem and to instantiate those models by mouse clicks on corresponding points, e.g. selecting the sun as the center of the circle, the earth as the mass, etc. The result would be instantiated equations and subproblems that remain to be solved. A solvable equation set could be solved automatically by symbolic algebra. This would move the focus of problem solving away from algebra and toward selection and instantiation of physical models. Like the use of pocket calculators (which seems to have caused students to lose the habit of mentally checking the plausibility of answers that was required with the use of a slide rule), such an approach might cause degradation of algebraic skills along with improvement in modeling skills. However, we have observed students rushing headlong into thickets of incorrect equations (from which they never emerged) rather than reconsidering the sources of the equations, so such a shift in emphasis may be beneficial.
A related research direction is machine learning of methods for analyzing particular kinds of problems based on correspondences, selected as described above, made by a physics expert. For example, the result of the correspondences described above between the centrifugal force law and the earth-sun system could be a general method for automatically producing the correspondences between a planetary system and the centrifugal force law; moreover, the fact that such a correspondence is known could suggest the use of this law in a new problem or even, by analogy, in a related problem such as the Bohr model of the hydrogen atom. Thus, learning of the method of application of physical principles could be a useful form of ``chunking'' that would allow future problems of a similar type to be solved more rapidly and automatically as a result of expert-guided practice.
Skilled problem solvers often use ``gedanken experiments'' involving diagrams to analyze problems; such analysis may be used to determine the direction of change in a system, equilibrium points, bounding points or extrema, connectivity between parts of a system, or how a change in one quantity will affect another. An excellent example is a method for determining whether a structural member of a bridge is under tension or compression: imagine the bridge with the member removed and imagine how the bridge would collapse. If the member would become shorter in the collapse (Figure 7), it must be under compression.
Figure 7: Removal of Bridge Member
The differences between human abilities to deal with diagrams and the
abilities of existing computer systems are striking:
Humans: | Computers: | |
Short-term memory: | Very limited | Large |
Perception: | Easy | Slow, difficult |
Calculation: | Poor; avoided | Easy, fast |
These differences, and especially the difficulty of perception of diagrams (still a research problem in itself), suggest that it would be unprofitable to try to duplicate human diagram processing directly. However, machine processing at a ``sketch'' level above the level of direct perception may be reasonable.
We believe that it is possible to identify a set of basic ``perceptual'' operators, analogous to those that people use with diagrams but implemented above the pixel level of an actual diagram, and to implement these to take advantage of the strengths of the computer.
A representation of geometric features such as lines, points, and circles by means of analytic geometry seems most appropriate for computer processing. Such a representation will not necessarily be exact; there may be small truncation errors, especially if the original input source was a pixel-based drawing program such as the one that provides input to BEATRIX. Nevertheless, such a representation should be sufficiently accurate to determine approximately such geometric features as a line terminating at another line, a line tangent to a circle, parallel lines, etc.
Geometric features such as lines should be connected, bidirectionally, with problem features that are represented symbolically. Sometimes geometric features represent actual objects, but in other cases they represent relationships (such as the earth-sun distance) or variable values. It must be possible to post values to the diagram representations; in this way, a diagram representation can serve the short-term memory function and allow opportunistic use of intermediate results that are ``read'' back from the diagram.
It should be possible to group geometric objects into (multiple) groups that are treated as units; for example, in the bridge problem above, two triangles formed of bridge members are treated as rigid bodies in analyzing how the bridge would collapse with a member removed.
A library of substrate models is essential if minimally specified problems, such as those found in textbooks, are to be understood. For example, a statement such as ``a ladder leans against a wall'' implies the existence not only of the wall, but also of a floor, and support of the bottom of the ladder by the floor. It is reasonable to assume that a prototypical diagrammatic representation of the typical spatial relationships of a ladder, wall, and floor is stored for use in understanding such problems. Careful reading of textbook problems shows that the reader is assumed to have and be able to employ such knowledge.
Perceptual operators ( e.g. detection of parallel lines, triangles, etc.) can operate at the analytic geometry level as special-purpose programs distinct from the production rule or other symbolic analysis system. We believe that ``noticing'' these features can be done rapidly if done by special-purpose programs that perform only this function. Noticing of features can be used to direct the attention of the problem analysis system to make inferences based on observed relationships. For example, the BEATRIX program uses the facts that two lines are tangent to a circle to infer the existence of a pulley system, resulting in the identification of the two lines as parts of a single rope. In some cases, things that are noticed should be assumed to be true (as when the rope appears to be parallel to the inclined plane); in other cases, noticing an apparent equality could trigger an attempt to prove equality by more rigorous methods.
Another kind of perceptual inference is relating of similar models. For example, in relating the earth-sun system to a circle, there are correspondences between the location of the sun and the center of the circle, between the earth-sun distance and the radius of the circle, etc. Here, a stored relationship between a physical relationship and a diagram of that relationship could be used to relate corresponding parts of two situations that have similar diagrams. For example, ``x orbits y'' corresponds to a circle diagram in which y is the center of the circle, x is a point on the circumference, and the distance between x and y is the radius. Given a similar diagram associated with a physical law (as in Figure 6), correspondences can be drawn between the elements that correspond to similar diagram elements. In this way, the diagrammatic representation becomes the basis for expressing the isomorphism between a problem situation and its physical model.
We have described uses of diagrams in existing programs that solve physics problems and considered additional ways in which diagrams appear to be used by humans. By implementing ``perceptual'' operations at a level below the operation of production rules or other symbolic reasoning, and by making use of relationships expressed as correspondences between diagrams, it may be possible to gain some of the advantages that humans derive from diagrams for computer problem-solving systems.
[ballard-brown] Ballard, D. H. and Brown, C. M., Computer Vision, Prentice-Hall, 1982.
[bulko:diss] Bulko, W., Understanding Coreference in a System for Solving Physics Word Problems, Ph.D. dissertation, Tech. Report AI-89-102, A.I. Lab, CS Dept., Univ. of Texas at Austin, 1989.
[bb1-manual] Garvey, A., Hewett, M., Schulman, R., and Hayes-Roth, Barbara, ``BB1 User Manual -- Interlisp Verison'', working paper KSL 86-60, Knowledge Systems Lab, Stanford Univ., 1986.
[chi] Chi, M., Feltovich, P., and Glaser, R., ``Categorization and Representation of Physics Problems by Experts and Novices'', Cognitive Science, vol. 5, no. 2 (April 1981), pp. 121-152.
[feynman:joking] Feynman, R. P., Surely You're Joking, Mr. Feynman!, New York: Norton, 1985.
[gieck] Gieck, K., Engineering Formulas, 5th ed., McGraw-Hill, 1986.
[hearsay] Erman, L. D., et al., ``The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty'', ACM Computing Surveys, vol 12, no. 2 (June 1980), pp. 213-253.
[kook:diss] Kook, Hyung Joon, A Model-Based Representational Framework for Expert Physics Problem Solving, Ph.D. dissertation, Tech. Report AI-89-103, A.I. Lab, C.S. Dept., Univ. of Texas at Austin, 1989.
[kook-novak-tkde] Kook, Hyung Joon and Novak, G., ``Representation of Models for Expert Problem Solving in Physics, IEEE Trans. on Knowledge and Data Engineering, 3:1, pp. 48-54, March 1991.
[larkin:cogsci] Larkin, J. and Simon, H. A., ``Learning through Growth of Skill in Mental Modeling'', Proc. Cognitive Science Society, 1981; also in [simon:mot2].
[larkin:lmss] Larkin, J., J. McDermott, D. Simon and H. A. Simon. ``Expert and Novice Performance in Solving Physics Problems'', Science, 208 (20 June 1980), pp. 1335-1342.
[larkin-simon:diag10k] Larkin, J. and Simon, H. A., ``Why a Diagram is (Sometimes) Worth 10,000 Words'', Cognitive Science, 11:65-99, 1987; also in [simon:mot2].
[miller] Miller, F., Progressive Problems in Physics, Boston: D.C. Heath, 1949.
[miller:magic7] Miller, G. A., ``The Magical Number Seven, Plus or Minus Two'', Psychological Review, 63:81-97, 1956.
[novak:glisp] Novak, G., ``GLISP: A LISP-Based Programming System With Data Abstraction'', A.I. Magazine, vol. 4, no. 3 (Fall 1983), pp. 37-47.
[novak:isaac] Novak, G., ``Computer Understanding of Physics Problems Stated in Natural Language'', Am. J. Computational Linguistics, Microfiche 53, 1976.
[novak:ijcai] Novak, G., ``Representations of Knowledge in a Program for Solving Physics Problems'', IJCAI, 1977, pp. 286-291.
[novak-bulko90] Novak, G. and Bulko, W., ``Understanding Natural Language with Diagrams'', Proc. Eighth National Conference on Artificial Intelligence (AAAI-90), 1990, pp. 465-470.
[simon:mot2] Simon, H. A., Models of Thought, vol. 2, Yale Univ. Press, 1989.