Unit 2.5.2 Strassen's algorithm
¶So far, we have discussed algorithms for computing \(C := A B + C\) via a triple-nested loop that then perform \(2 m n k \) flops. A question is whether this is the best we can do.
A classic result regarding this is Strassen's algorithm [26]. It shows that, for problems of size \(m = n = k = 2^r \) for some integer \(r \text{,}\) matrix-matrix multiplication can be computed in time \(O( n^{\log_2{7}} ) \approx O( n^{2.807}) \text{.}\) Since Strassen proposed this, a succession of results further improved upon the exponent. In this discussion, we stick to the original result.
How can this be? For simplicity, assume \(m = n = k \) are all even. Partition
where \(X_{ij} \) are \(n/2 \times n/2 \) for \(X \in \{ C, A, B \} \) and \(ij \in \{ 00, 01, 10, 11 \} \text{.}\) Now that you understand how partitioned matrix-matrix multiplication works, you know that the following computations compute \(C := A B + C \text{:}\)
Each of the eight matrix-matrix multiplications requires \(2 (n/2)^3 = 1/4 n^3 \) flops and hence the total cost is our usual \(2 n^3 \) flops.
Surprisingly, the following computations also compute \(C := A B + C \text{:}\)
If you count carefully, this requires 22 additions of \(n/2 \times n/2 \) matrices and 7 multiplications with \(n/2 \times n/2 \) matrices. Adding matrices together requires \(O( n^2) \) flops, which are insignificant if \(n \) is large. Ignoring this, the cost now becomes
flops. The cost of the matrix-matrix multiplication is now \(7/8\) of what it was before!
But it gets better! Each of the matrix-matrix multiplications can themselves be computed via this scheme. If you do that, applying the idea at two levels, the cost is reduced to
flops. How many times can we do that? If \(n = 2^r \) we can half the size of the matrices \(r = \log_2( n ) \) times. If you do that, the cost becomes
Now,
Here we ignored the cost of the additions. However, it can be analyzed that this recursive approach requires \(O( n^{2.807} )\) flops.
Remark 2.5.1.
Learn more about the Strassen's algorithm entry on Wikipedia. In Unit 3.5.4 we will discuss how our insights support a practical implementation of Strassen's algorithm.