NOTE: This transcription was contributed by Martin P.M. van der Burgt. Although its markup is incomplete, we believe it serves a useful purpose by virtue of its searchability and its accessibility to text-reading software. It will be replaced by a fully marked-up version when time permits. —HR           
  
On one-sided smoothing of event sequences.                           
We consider a finite number of events Ei (i from
0 through N) occurring at moments ti (i > j ⇒ ti > tj). 
The problem I found myself faced with was how to 
define a continuous function f(t) that could be 
interpreted as “the event frequency at moment t.” 
(I encountered this problem when trying to discover 
a strategy for deciding how much primary store 
should be allocated to independent programs in a  
multiprogrammed environment when for each program 
a “target page fault frequency” had been decided.) 
A possible definition of the event frequency is 
in terms of the Dirac function: 
|  | N |  | 
| f(t) = | Σ   δ (t - ti) | (1) | 
|  | i=0 |  | 
  
 
which has the obvious advantage that it satisfies
|  | +∞ |  | 
|  | ∫ f’(t).dt = N + 1 | (2) | 
|  | -∞ |  | 
  
but the equally obvious disadvantage of severe 
discontinuities, which makes the comparison of f0(t) 
and f1(t) — two different frequencies corresponding 
to different values of the parameter I could control — 
rather difficult. Hence our desire to smooth the 
data. The smoothing, however, had to be one-sided 
in the sense that the smoothed value f’(t) may
at any moment t only depend on the set of 
values t
i with t
i ≤ t. In order to make a 
sensible choice among the wealth of possibilities 
for f(t) we should impose upon f(t) as many 
“sensible” constraints as we can think of. To 
start with,  property (2) seems a good choice. 
The next one is that we should realize that 
frequencies —as usually understood— have a 
dimension “time 
-1”, i.e. if we consider at moments 
t’
i = at
i and define the corresponding frequency 
f’(t) for that second event sequence, then 
should hold. Poperty (3) is not in conflict with 
requirement (2), for
| +∞ | +∞ | +∞ | 
| ∫ f’(t’).dt’ = | ∫ f’(at).d(at) = | ∫ a.f’(at).dt = | 
| -∞ | -∞ | -∞ | 
|  | +∞ |  | 
|  | ∫ f(t).dt = N + 1 |  | 
|  | -∞ |  | 
Reasonable as requirement (3) may seem it is 
(for finite sequences) too much to hope for: if t
0 = 0 
the relation (3) will not hold for t
0 ≤ t ≤ t
1, for how 
are we going to guess “a”? Therefore we must loosen
it and can only require (3) to hold asymptotically 
for t > t
i with sufficiently large i. 
Relation (2) however, still stands rather firmly. Because 
in (2) we have only used  
We could consider functions w
i(t) such that 
|  | w(t) = 0  for  t < 0 | (4) | 
|  | ∞ |  | 
| and | ∫ w(t).dt = 1 | (5) | 
|  | 0 |  | 
  
|  
  |  | N |  | and replace (1) by  f(t) = | Σ   wi (t - ti) |  |  | i=0 |  | (6) | 
where (4) does credit to the one-sidedness of the 
required smoothing , (5) guarantees (2) and (6) makes 
the whole definition invariant for shifting of the 
origin. The question is whether we can choose the w
i
in such a way — satisfying (4) and (5) — as to 
approximate (3) at least asymptotically. 
A reasonable first guess — corresponding to fading 
out of the past in an exponential way — for wi(t) is 
with ki > 0: 
|  | wi(t) = 0    for t < 0 |  | 
|  | = ki.e-ki.t   for t ≥ 0 | (7) | 
  
where we can try to adapt k
i — on account of past 
history, if any — towards a better approximation of (3). 
I called (7) a first guess, because we also wanted  
a continuous function f(t) and with (6) and (7), 
f(t) is discontinuous for any t = t
i. A continuous 
f would follow from the proper linear compositum 
— proper in view of (4), (5) and w
i(t) = 0 — 
|  | wi(t) = 0   for t < 0 |  | 
|  | = (e-ki.t - e-hi.t).(hi.ki)/(hi - ki)  for t ≥ 0 | (8) | 
  
and now we are free in the choice of both h
i and k
i, 
provided that they are both positive and different 
from each other. I am severely tempted to 
restrict myself to 
|  | hi = c * ki   with  c ≠ 1 and c > 0 | (9) | 
  
the reason being that both h
i and k
i are of 
dimension “time
-1” and that I cannot forget 
my target (3). 
*              *
*
 
In order to come to grips with the constant c
I studied (omitting subscripts “i”) according to (8) & (9) 
  |  | w(t) = k . c/(c-1) . (e-kt - e-ckt) | 
and tried to determine  c  so as to minimize the 
maximum value of 
(I wanted my ripples as smooth as possible.) As this  
led to the inacceptable value c = 1, I took the 
limit for c → 1 and found — and this is my
final suggestion — 
|  | w(t) = 0   for t < 0 |  | 
|  | = ki2t.e-kit   for t ≥ 0 | (10) | 
  
 
|  | ∞ | 
 
| Also (10) has the property that | ∫ wi(t).dt = 1 | 
 
|  | 0 | 
 
 
*              *
*
 
The next thing to decide is, how to choose 
the value ki. A constant value for ki seems 
in view of (9) out of the question, because ki 
has the dimension of a frequency. Apart from 
initial difficulties it seems reasonable to choose  
ki proportional to f(ti) i.e. 
To get some feeling for a reasonable value 
for d, we consider f(t) for t≥0 as results 
from an 
infinite sequence of events that have 
occurred at t=0, -1, -2, -3, .... etc; for all passed 
events we may assume the same value k
i = k 
and we find for f(t), according to (6) and (10) 
 
|  | ∞ | 
 
| f(t) = k. | Σ   k.(t+i).e-k.(t+i) | 
 
|  | i=0 | 
 
 
which, indeed satisfies f(0) = f(1) as was to  
be expected. 
Note. I summed the sequence and found 
|  | k2.e-k.t                 e-k | 
| f(t) = | (————) . (t + ————) | 
|  | 1 - e-k                  1 - e-k | 
  
(End of note) 
The functions (10) have all one maximum, 
viz. at t = ki-1. We would like to attribute 
the maximum of f(t) for 0 ≤ t ≤ 1 to the term 
with i = 0 (i.e. the last event). For i > 0 all the  
terms should be decreasing functions of t for t > 0. 
We can expect the smoothest maximum of f(t) 
if it is in the neighbourhood of t = ½ and
this suggests k in the neighbourhood of 2 and 
For, because in the average f(t) = 1, we know 
that f(0) = f(1) < 1, therefore with d = 2 we 
shall find k < 2 and the term with i = 0 
will reach its maximum at t
0 > ½. Because 
the contribution of the remaining terms is a 
decreasing function of t, this will be compensated 
and f(t) will find its maximum at t
f < t
0. 
Note. Using I sliderule I have investigated the
consequences of the choice d = 2. Salvo errore et 
omissione I found for k=1.616, f(0) = f(1) = 0.808 
as minimum value for f(t)  and f(0.41) = 1.105 as 
maximum value. Anyone that would like to rely on 
these figures should reestablish them in a more  
reliable manner, I only mention these results because 
they represent the outcome of an hour’s work and give 
some impression of how smooth f(t) becomes 
in the case of equally spaced events. If it were 
of importance, I would try slightly smaller values of 
d as well. (End of note) 
| 19th February 1975 | prof.dr.Edsger W. Dijkstra | 
| Plataanstraat 5 | Burroughs Research Fellow | 
| NUENEN - 4565 |  | 
| The Netherlands |  | 
  
   Transcribed by Martin P.M. van der Burgt
    Last revision
    
      2014-12-08
    
  .