The Boyer-Moore Fast String Searching Algorithm

This algorithm, which Bob Boyer and I invented in about 1975, is the basis of the fastest known ways to find one string of characters in another. How might you look for the following pattern in the text below?

pattern: EXAMPLE
   text: HERE IS A SIMPLE EXAMPLE

We can show you the Knuth-Pratt-Morris algorithm and we can show you the Boyer-Moore algorithm.

Our algorithm has the peculiar property that, roughly speaking, the longer the pattern is, the faster the algorithm goes. Furthermore, the algorithm is ``sublinear'' in the sense that it generally looks at fewer characters than it passes. The algorithm is described in

A Fast String Searching Algorithm, with R.S. Boyer. Communications of the Association for Computing Machinery, 20(10), 1977, pp. 762-772.

The classic Boyer-Moore algorithm suffers from the phenomenon that it tends not to work so efficiently on small alphabets like DNA. The skip distance tends to stop growing with the pattern length because substrings re-occur frequently. By remembering more of what has already been matched, one can get larger skips through the text. One can even arrange ``perfect memory'' and thus look at each character at most once, whereas the Boyer-Moore algorithm, while linear, may inspect a character from the text multiple times. This idea of remembering more has been explored in the literature by others. It suffers from the need for very large tables or state machines.

However, in 2007, Matyas Sustik and I thought of an apparently new variation that gives very good performance on small alphabets and has a small table. The algorithm is described in UTCS Tech Report TR-07-62, ``String Searching over Small Alphabets.''

[Best Ideas] [Publications] [Research] [Home]