EWD316: A Short Introduction to the Art of Programming
by
prof.dr.Edsger W.Dijkstra
August 1971
0. Contents
1. Preface
2. Some Fundamental Notions
3. Programming Languages and their Implementation
4. Variables and relations between their values
5. Programs corresponding to recurrence relations
6. A first example of step-wise program composition
7. The shortest spanning subtree of a graph
8. The towers of Hanoi
9. The problem of eight queens
10. A rearranging routine
3. Programming Languages and their Implementation
The activity of composing programs is called "programming". In the preceding section we have introduced programs as algorithms intended to control the behaviour of machines and by virtue of the actual existence of such machines, programming is a very practical activity. It is one of the youngest branches of applied mathematics (in the broad sense of the word, i.e. not confined to mathematical physics or numerical analysis), it is as important as the applications in question, it is practical in the sense that it is the programmer's intention that a machine will actually display the behaviour as prescribed by the algorithm. For that reason a conscious programmer should respect the limitations of the (finite) machines. Alternative programs causing a machine to establish the same net result and therefore in that respect equivalent, may differ greatly in what is usually called "efficiency", i.e. in the demands they make upon the machine's resources. For many years, efficiency has been used as the sole yard-stick along which to compare the relative quality of alternative programs for the same task. In the meantime, programming has turned out to be so difficult, that other quality aspects have gained relative importance, such as the ease with which we can understand a program, can convince ourselves of its correctness or can modify it, etc. Yet, efficiency concerns cannot be ignored and in order to give the reader some feeling for the nature of the limitations he should respect, we shall give in this section an overall view of computing machines and the way in which they execute programs.
In this little monograph we shall confine our attention to sequential algorithms, i.e. algorithms describing what actions should happen in succession, one after the other. Such algorithms have a property for which they have been blamed (and not entirely without justification), viz. that they are often "overspecific" as regards the order in which things have to happen. If two actions, say "A" and "B" have both to be done, a purely sequential algorithm will prescribe
either "A; B" or "B; A" ,
viz. action A followed in time by action B or the other way round. It will not express that the order is immaterial and, what is possibly more important, it will not express that the two actions are so "non-interfering" with each other that they may take place concurrently, or —to use the jargon— may be done in parallel.
For various reasons I have decided to restrict my attention to purely sequential programs. The most obvious reason is to be found in the structure of the machines that are currently available or can be expected to become available in the next years. One or two decades ago, machines used to be purely sequential. In the meantime we have got equipment allowing for a limited amount of parallelism (dual processor machines, independent communication channels etc.), but such pieces of equipment are at best an aggregate of a small number of individual sequential components. In such machines the potential parallelism of activities is exploited by standard control programs (so-called "operating systems"), while the individual user still works in a strictly sequential environment. And it is to the individual user that this little monograph addresses itself.
With the advent of what is called "large scale integration" (being a term from the computer field, its acronym LSI is better known!) it seems to become technically feasible to build machines more like "clouds of arithmetic units" with information processing activities going on simultaneously all over the place, for shorter periods of time even independently of each other. Programming for such machines will pose completely different trade-off problems: one will be willing to invest in potentially useful computing activity before its actual usefulness has been established, all for the sake of speeding up the whole computation. But although I know such machines may be coming, I shall not touch these problems for the following reasons. First, as far as general purpose applications are concerned, I have my doubts about the effectiveness with which such forms of parallelism can ever be exploited. Second —and that is the most important consideration— parallel programming is an order of magnitude more difficult than sequential programming. (This statement will be doubted but I have enough experience in multiprogramming to feel myself entitled to say so. The point is that with parallelism a great variety of happenings may take place under control of the same program(s). On account of undefined speed ratios a set of parallel programs is written for a partly non-deterministic machine and special care is required to ensure that, on a higher level of abstraction, its total behaviour can again be regarded as uniquely determined by the program(s).) Third, I am not over-impressed by the complaints that sequential programs specify a more stringent time-succession than logically necessary: I have often the somewhat uneasy feeling that these complaints finds their origin in the mathematical tradition of the pre-computer age. In classical mathematics the notion of an algorithm has been neglected; mind you, I am not blaming the previous mathematicians for this because before the actual existence of automatic computers, algorithms were hardly a relevant subject. But we should not close our eyes to the fact that the course of history has caused mathematics to be more tuned to timeless problems to static relations, to functional dependence. (The existence of classical mechanics does not contradict this observation: renaming the independent variable in the differential equations "k", say, instead of the usual "t" does not influence the mathematics involved.) Some of the efforts to remove the overspecification of the time-succession —they rely heavily on functional dependence— strike me as tackling the programming problem with classical concepts that have been developed for other purposes. So much for my decision to restrict my considerations to sequential machines.
To get some feeling for the demands made upon the modern automatic computer, let us focus our attention for a moment upon an average sizable computation, for instance, the computation of (a good approximation of) the inverse of a given matrix of, say, 100 by 100 elements. Such a job has two markedly quantitative aspects:
a) | a vast amount of numbers is involved: posing the problem implies the specification of 10.000 numbers, the answer is also given by 10.000 numbers (each of which is, in general, a function of all 10.000 elements of the given matrix) |
b) | a vast amount of computation has to be done: if it is done by elimination, the number of operations (i.e. multiplications and additions) is of the order of magnitude of 1.000.000. |
The construction of machines able to cope (reliably!) with these two very different aspects of "multitude" is one of the greater triumphs of electronics. It has been achieved by applying the old and well-known principle: "Divide and Rule.". In modern computers one can distinguish two vital components, each of which has the specific task to cope with one of the forms of multitude.
a) | the store (called "memory" in American); this is the component able to receive, store and return vast amounts of information; its primary function is to be large, to be able to contain very much information |
b) | the arithmetic unit or processor; this is the component in which the actual work —adding, subtracting, multiplying, comparing etc.— is done; its primary function is to be very fast so that it may do a great deal in a limited period of time. |
It is not the function of the arithmetic unit to be large in the sense that it should contain large amounts of information. On the contrary: while nearly all the information, relevant for te computation at large, lies "sleeping" in the store, at any moment of time only the tiny fraction actually involved in the information processing activity is found (copied) in the arithmetic unit, which is only capable of dealing with a few numbers at a time, say the two numbers to be added and the sum formed by the act of addition. Whenever two numbers (in store) are to be added, they are transported from the store to the arithmetic unit, where the sum will be formed; once formed the sum will either be kept in the arithmetic unit for immediate further processing or it will be sent back to store for later processing. Microscopically, the store acts as the icebox in which all information is kept which is not involved in the current activity of the arithmetic unit. If small letters indicate variables in store and R indicates a register in the arithmetic unit, the computation
x := (a + b)*(c + d)
might be evoked by the following sequence of instructions:
R:= a;
R:= R + b;
t:= R;
R:= c;
R:= R + d;
R:= t * R;
x:= R
The first instruction fetches the value of a from store into the register R, the next one increases (the contents of) R by the value of b (from store). At this stage one of the two factors to be multiplied has been computed. Before the multiplication can take place, the second factor has to have been computed as well; in a machine with a single register R for arithmetic results, this second addition implies again the use of the register R. In order to make this register available for this purpose, the third instruction sends the value of the first factor —a so-called "intermediate result"— back to store, assigning it to a variable here named "t": the first sum is sent back to store for later usage. The fourth and fifth instructions compute the second factor, the value of which is left in R, ready for multiplication by the stored valued called "t". The last instruction stores the product, now formed in R, so that it can be retrieved under the name "x" for later usage.
The above example illustrated many things. It shows how
x:= (a + b)*(c + d) ,
which on one level of interest can be regarded as a single action, on closer inspection turns put to be a sequential process taking place as a time-succession of seven more primitive sub-actions ("program steps"). It also shows that at any moment in time only a tiny portion of the algorithm is in active control of what actually happens: while the first of the second addition is performed, the fact that the two sums will have to be multiplied is still "dormant". (If the total action had been
x:= (a + b)/(c + d) ,
the only modification necessary would have been the replacement of the sixth instruction by
R:= t / R ;
the first five instructions would have been insensitive to this change. That is what I meant by "dormant".)
It also demonstrates that, just as at any moment in time, only a tiny fraction of the numerical information is involved in actual processing, also only a tiny fraction of the program exercises control, viz. the instruction currently executed.
It also demonstrates that it is no good just to divide the machine into two components, store and arithmetic unit, but that one must also provide for a (dense) information traffic between the two; this is provided for by what connects the two together, the so called "selection".
We have said that the store should be able to store information; it must, for instance, be able to store "numbers", e.g. the intermediate result called "t". Obviously, these numbers cannot be kept in store like balls in an urn: when the instruction
R:= t * R ;
has to be executed, the store must not return just any number, but quite definitely it must return the value sent to it two instructions earlier. For that reason the numbers in store are not arranged as balls in an urn, on the contrary! Store is arranged as a number of so-called "storage cells", each capable of holding the value of one number at a time. Each storage cell is identified by its so-called "address"; each time contact with the store is required —either to receive or to return information— this request is accompanied by a statement of the address of the storage cell involved. If the store is to receive information —this is called "writing into store"— the value to be stored and the address of the storage location involved (plus a "write request") are sent to store and selection respectively; as a result of the writing operation the original contents of the storage cell, which get lost, are replaced by the new value. If the store is to return information —this is called "reading from store"— the address of the storage cell involved (plus a "read request") is sent to the selection; as a result the contents of the storage cell are returned from store (and kept in the storage cell as well for later reading if desired). As far as destruction, reception and reproduction of the information contained in a storage cell are concerned, the situation shows great analogy to a tape in a tape recorder. You can use the tape to record as many pieces of music as you want, but only one at the same time: whenever you record a new piece of music on an old tape, its previous contents are wiped out; the piece of music currently recorded, however, can be played back as many times as you wish. (To make the analogy a true one, we must restrict ourselves to pieces of music of equal duration, precisely matched to the length of the tape, matched to its (finite) information capacity.
Storage cells can store information by virtue of the fact that they can be in a finite number of distinct states. In practically all computers they are composed of elementary components, each of which can be in one of two possible states. (The most common form is a little ring of ferromagnetic material that will be circularly magnetized in one of the two possible directions.) One such component can be in 2 different states (say "North" and "South"), two such components can be together in 4 different total states, ("North-North", "North-South", "South-North" and "South-South"), N such components together can be in 2N mutually different states. The number of elementary components associated with each storage cell is a characteristic constant of the machine's store and is called "the word length". If the word length is 32, the number of different possible total states per word is 232, i.e. slightly over 4*109; the arithmetic unit will associate with each state a numerical value; in terms of these numerical values a storage cell can then hold, for instance any integer value ranging from (roughly) −2*109 to +2*109.
The finite capacity of the storage cell is something the user must be aware of: it is matched to the abilities of the arithmetic unit, i.e. if the latter deals with integer values it is geared to operations up to a certain maximum absolute value, if it deals with (approximations of) real numbers, it is geared to dealing with them in a certain precision, maximum absolute value and precision respectively being chosen such that the numerical values to be distinguished between can be stored in one (or possibly two successive) storage cells. If greater integers or reals in higher precision have to be manipulated, special measures have to be taken which will be more expensive.
In the meantime we have explained enough about the general machine structure to mention to aspects of the "costs" involved in the execution of a program. One of them is computation time. We have seen that the arithmetic unit performs one operation after the other, and the more operations a program prescribes, the longer the total amount of time the arithmetic unit will have to spend to carry the computation to completion. The other one is storage usage. If a thousand values have to be computed in order to be added together, we may compare the following two algorithms. The first one first computes all thousand values and stores them, after which they are added, the second algorithm immediately adds each number to the partial sum as soon as it has been computed. With regard to storage usage the first algorithm is more demanding: at some stage of the computation it requires a sufficient amount of store to hold all thousand values, an amount of store which in the second algorithm, remains available for other (perhaps more useful) purposes.
So much for the finite capacity of each storage cell and the fact that a store contains only a finite number of such cells. Let us return to their addresses: a while ago we hinted that each storage cell is identified by an "address" and that each reference to store takes place under control of the address of the storage cell concerned, but up till now we have not been very explicit about what an address really is. Well, this is very simple: the storage cells are numbered: 0, 1, 2, 3, 4, 5, ... up to and including M - 1 if the store comprises M different storage cells (M between 16.000 and 1.000.000 being typical figures), and the ordinal number of each storage cell is used as "its address" (like houses in a street!) This implies that the storage cells have a natural order, viz. the order of increasing address. Given the address of a storage cell, the address of the next storage cell can be computed by adding 1 to the given address of the preceding one.
This natural ordering of the storage cells is heavily exploited. If a vector, i.e. a sequence of numbers a0, a1, ... , an has to be stored, its elements can be stored in successive storage cells. If the address of element a0 is known, the address of element ai can then be computed, viz. by adding (the value of) i to the address of the element a0.
The natural order of the storage cells is also exploited in the program representation. Remember, we have postulated that a machine could "accept" a program, and that, once the program had been accepted, the machine could execute the program (i.e. cause the happening as prescribed by the program). In other words, when the machine is executing the program, this program —i.e. "the information describing how to behave"— must be somewhere in the machine! Where? Well, in the store, the store being specifically the machine component able to hold information. In other words, the store is used for two different purposes: it holds the numerical information to be manipulated, but it also holds —in some other part of it— the program, i.e. the information describing what manipulations have to be performed.
To sketch briefly how this can be done, we return to our previous example, where
x := (a + b)*(c + d)
was decomposed into the sequence of seven instructions denoted by
R:= a;
R:= R + b;
t:= R;
R:= c;
R:= R + d;
R:= t * R;
x:= R
and the question is: by means of what conventions do we represent the above information in a store capable of holding numbers? This is achieved by a two-stage convention, one for representing single instructions and one for representing a sequence.
The first convention is to choose for each instruction a unique number code. In the above notation we have denoted variables (or: the addresses of the storage cells associated with the variables and containing their current value) with small letters (a, b, c, d, t, and x); but addresses are numbers and that component is therefore already numerical. Using "s" for "any address" we see that in the above example, we can distinguish instructions of four different types:
1. | R := s |
2. | R := R + s |
3. | R := s * R |
4. | s := R |
The second part of the convention associates with each type of instruction a number (e.g. the numbers 1, 2, 3, and 4 to the types shown above; it should be mentioned that in actual machines the number of instruction types is considerably larger than 4). By concatenating the digits describing the instruction type number with those giving the address we have a number code for each possible instruction and we assume that the word length of the storage cell is sufficient to contain such a number. The first convention as just described, reduces the problem of storing a sequence of instructions to storing a sequence of numbers. Now the second convention is that the sequence as such is represented by storing these numbers in successive storage cells, i.e. storage cells with successive addresses. And this completes (roughly) the trick.
(Note. This is by no means the only way to represent programs inside the machine's store; in so-called "stack machines" other conventions are chosen. The above elaboration is only shown by way of example, demonstrating the possibility.)
The dual role of the store —storage of instructions and storage of variables— implies another way in which a program can be expensive to execute: if the program text is very long, by that very fact the program text will make a heavy demand on storage capacity. If we have two alternative programs for the same job, one requiring 5000 instructions to describe it and the other requiring 10000 instructions to describe it, then —all other things being equal— the first alternative will be cheaper.
The above concludes our bird's eye view of the so-called hardware machine, i.e. the physical piece of electronic equipment as it is delivered by the manufacturer: a machine that can accept and then execute programs written as long sequences of instructions from an instruction repertoire that is specific for this particular machine (and its copies). Until the late fifties programmers indeed produced their programs as long sequences of such instructions, but when machines became faster and when more came on the market, the shortcomings of this way of working became more and more apparent.
Because the programmer expressed his program in terms of the instruction repertoire of that particular machine, he was forced to tailor his program to that machine. He needed a thorough and ready knowledge of all the details of the instruction repertoire of that machine —which for the more intricate machines was no mean task— and worse: once his program was written, it could only be executed by that particular machine. Exchange of programs between institutes equipped with different machines was impossible; furthermore, whenever an institute replaced its old machine by a new and different one, all programs for the old machine became obsolete. From that point of view it was clear that tailoring one's programs so closely to the idiosyncrasies of a specific piece of hardware was not a very wise investment of the intellectual energies of one's programmers.
But even without the problem of transferring programs from one hardware machine to another, this way of programming, expressing programs as a monotonous stream of machine instructions, showed great drawbacks. One serious drawback was that this close contact between programmer and physical machine did not only enable the programmer to lard his program with all sorts of coding tricks, it actually invited him to do so. For many a programmer this temptation became irresistible; there even has been a time when it was generally believed that one of the most vital assets of a virtuoso programmer was that he be "puzzle-minded", and it was only slowly recognized that a clear and systematic mind was more essential! When "tricky programming" was en vogue, programming was not only very expensive —it took too much time— it also turned out to be too difficult to get a program correct. Looking back, the period of tricky programming now strikes us as a generation of programmers walking on a tight-rope, in full confidence because they were unaware of the abysmal depth beneath it! The modern competent programmer is more humble and avoids clever tricks like the plague.
It was not only the preponderance of coding tricks that made programming "in machine code" as it is called nowadays, too difficult and too risky.
Firstly, a program in machine code contains very little redundance and as a result it is very sensitive to even small writing errors —errors of the level of "spelling mistakes" or "printing errors".
Secondly, the programmer who thinks in terms of variables has to denote these variables in his program text by the addresses of the storage cells dedicated (by him) to hold their values. As a result the programmer has the burden of storage layout and all the clerical work implied by this.
Thirdly —and this is the major reason— machine code is an improper vehicle to represent "structure": it is just a single, monotonous sequence of machine instructions, incapable of expressing in a direct and useful form the structure of the algorithm. In what follows it will become abundantly clear that when we wish to compose programs in a reliable fashion, we can only do so by structuring the happenings we intend to evoke, and that we are in urgent need of a descriptive vehicle such that in the program text itself the structure of the happenings —i.e. if the computations— can be adequately reflected.
The above shortcomings led to the design of so-called "(higher level) programming languages". A programming language can be regarded as the machine code for a fictitous, idealized machine. Whereas the old machine codes were tailored to the needs of the hardware —i.e. the equipment electronic engineers could make— programming languages are more tailored to the intellectual needs and conceptual difficulties of the programmer who has to design the computations.
The problem now is that on the one hand we have the hardware machine A, that can be built but for which we don't like to program because it is too cumbersome, and on the other hand we have "dreamed up" the fictitious machine B, for which we would love to program but which the engineers cannot build. How do we bridge that gap?
The gap is bridged by "software": given machine A, we can make, once and for all, a program (in the machine code for machine A) which prescribes to machine A the pattern of behaviour it should follow if it is to simulate machine B. Such a program is called "software for machine A". Given hardware machine A, loaded with software, we have a mechanism —partly "hard", partly "soft"— that is able to execute programs written for the fictitious machine B.
Usually this combination of hard- and software processes such programs in two stages. In the first stage (the "translation stage") the program written in programming language B is subjected to a translation process. In this process a storage layout is decided, the necessary bookkeeping is carried out and an equivalent program —but now expressed in machine code A— is produced. In the second stage (the "execution stage") the output of the first one is interpreted by machine A as a program and the intended computation is evoked.
The standard software that goes with the machine shields the user from the idiosyncrasies of the specific machine; apart from that it invokes —behind the user's back, so to say— the standard ways of dealing with the tougher properties of the hardware, such as the possible paralellism (i.e. concurrence in time) of the computation proper and information transfers from and to peripheral devices and multilevel stores. Up till now we have described the hardware as if all storage cells were equally well accessible for the arithmetic unit. In practice this is seldom the case, two storage levels being quite common: primary store (usually ferrite cores) and secondary store (usually magnetic drums). The cells in primary store are the only ones that are directly and immediately accessible for the arithmentic unit; the information in secondary store (which in capacity is an order of magnitude larger than primary store) is not directly accessible for the arithmetic unit, but the possibility of bulk transfers from primary store and secondary store is availabe instead. In such machines, the software may move the information around between the two stores, all the time keeping track of where everything is to be found at any moment and trying to keep in primary store all "currently relevant" information. (This is called "the implementation of a virtual store".)
We have mentioned the concept of the virtual store because it is related to an efficiency aspect over which the programmer has some control and in regard to which the programmer therefore has some responsibility. This is called "vagrancy". A program has a small degree of vagrancy whenever for larger periods of time access are confined to a small, dense subset of the total amount of information; in that case the hope is justified that during that period of time this dense subset will be kept in primary store and that therefore the computation can go on at full speed. In computations with high vagrancy, the probability of information needed in secondary store is much larger and the transport facilities between the storage levels then tend to become the bottle-neck. Therefore, if possible, high vagrancy should be avoided.
Next chapter: 4. Variables and relations between their values