Householder QR Factorization With Randomization for Column Pivoting (HQRRP)
Per-Gunnar Martinsson, Gregorio Quintana-Orti, Nathan Heavner, Robert van de Geijn.
SIAM Journal on Scientific Computing, Vol. 39, Issue 2, C96-C115 (20 pages), 2017
Parallel Matrix Multiplication: A Systematic Journey
Martin D. Schatz, Robert A. van de Geijn, Jack Poulson.
SIAM Journal on Scientific Computing
Vol. 38, Issue 6, 2016 (online)
The BLIS Framework: Experiments in Portability
Field G. Van Zee,
Tyler Smith,
Bryan Marker,
Tze Meng Low,
Robert A. van de Geijn,
Francisco D. Igual,
Mikhail Smelyanskiy,
Xianyi Zhang,
Michael Kistler,
Vernon Austel,
John Gunnels,
Lee Killough.
ACM Transactions on Mathematical Software
Article No. 12, Volume 42, Issue 2, June 2016
A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores
Ardavan Pedram, John McCalpin, Andreas Gerstlauer.
The Journal of Signal Processing Systems.
BLIS: A Framework for Rapidly Instantiating BLAS Functionality
Field G. Van Zee,
Robert A. van de Geijn
ACM Transactions on Mathematical Software (TOMS)
Volume 41, Issue 3, June 2015
A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling
Kyungjoo Kim, Victor Eijkhout.
ACM Transactions on Mathematical Software
Volume 41 Issue 1, October 2014
Balancing task- and data-level parallelism to improve
performance and energy consumption of matrix computations on the Intel Xeon Phi
Manuel F. Dolz, Francisco D. Igual, Thomas Ludwig, Luis Piñuel, Enrique S. Quintana-OrtÃ.
Computers & Electrical Engineering, 2015
Non-orthogonal spin-adaptation of coupled cluster
methods: A new implementation of methods including quadruple
Devin A. Matthews and John F. Stanton
The Journal of Chemical Physics, 142 (6), 2015.
Algorithm, Architecture, and Floating-Point Unit
Codesign of a Matrix Factorization Accelerator.
Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn
IEEE Transactions on Computers, Special Section on Computer Arithmetic, August 2014.
Exploiting Symmetry in Tensors for High Performance.
Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, Tamara G. Kolda.
SIAM Journal on Scientific Computing, 36(5), Sep. 2014
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortí
ACM Transactions on Mathematical Software (TOMS)
April 2014
High-Performance Solvers for Dense Hermitian Eigenproblems
Matthias Petschow, Elmar Peise, Paolo Bientinesi
SIAM Journal on Scientific Computing, Volume 35(1), pp. C1-C22, January 2013.
A Case Study in Mechanically Deriving Dense Linear Algebra Code
Bryan Marker, Don Batory, and Robert van de Geijn
The International Journal of High Performance Computing Applications Volume 27 Issue 4, November 2013
Deriving Linear Algebra Libraries
Paolo Bientinesi, John Gunnels, Maggie Myers, Enrique Quintana-Orti, Tyler Rhodes, Robert van de Geijn, and Field Van Zee
Formal Aspects of Computing
The FLAME Approach: From Dense Linear Algebra Algorithms to High-Performance Multi-Accelerator Implementations
Francisco D. Igual, Ernie Chan, Enrique S Quintana-Orti, Gregorio Quintana-Orti, Robert A van de Geijn, Field G van Zee
Journal of Parallel and Distributed Computing
High-performance up-and-downdating via Householder-like transformations
Robert A. van de Geijn, Field G. Van Zee
ACM Transactions on Mathematical Software (TOMS), 2011
Using desktop computers to solve large-scale dense linear algebra problems
Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Robert van de Geijn
The Journal of Supercomputing, Vol. 58, Issue 2, 2011
Goal-Oriented and Modular Stability Analysis
Paolo Bientinesi, Robert A. van de Geijn
SIAM Journal on Matrix Analysis and Applications , Volume 32 Issue 1, February 2011
Sparse Direct Factorizations through Unassembled Hyper-Matrices
Paolo Bientinesi, Victor Eijkhout, Kyungjoo Kim, Jason Kurtz, and Robert van de Geijn
Computer Methods in Applied Mechanics and Engineering, 199, 430--438, 2010
Toward Mechanical Derivation of Krylov Solver Libraries
Victor Eijkhout, Paolo Bientinesi, Robert van de Geijn
Procedia Computer Science, 1(1) 1805-1813, 2010 (Proceedings of ICCS2010.)
The libflame Library for Dense Matrix Computations
Field G. Van Zee,
Ernie Chan,
Robert A. van de Geijn,
Enrique S. Quintana-Orti,
Gregorio Quintana-Orti,
IEEE Computing in Science and Engineering, Vol. 11, No 6, November/December 2009
Collective communication: theory, practice, and experience
Ernie Chan, Marcel Heimlich, Avi Purkayastha, Robert van de Geijn
Concurrency and Computation: Practice & Experience , Volume 19 Issue 1, September 2007
A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations
Paolo Bientinesi, Inderjit S. Dhillon, Robert A. van de Geijn
SIAM Journal on Scientific Computing , Volume 27 Issue 1, July 2005
Strassen's Algorithm Reloaded
Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16)
Transforming a Linear Algebra Core to an FFT Accelerator.
Ardavan Pedram, John McCalpin, and Andreas Gerstlauer.
ASAP 2013, to appear.
Code Generation and Optimization of Distributed-Memory Dense Linear Algebra Kernels
Bryan Marker, Don Batory, and Robert van de Geijn.
International Workshop on Automatic Performance Tuning (iWAPT'13)
Floating Point Architecture Extensions for Optimized Matrix Factorization
Ardavan Pedram, Andreas Gerstlauer and Robert van de Geijn.
21st IEEE International Symposium on Computer Arithmetic, to be held in Austin, Texas, USA in April 2013. Accepted.
On the Efficiency of Register File versus Broadcast
Interconnect for Collective Communications in Data-Parallel Hardware
Ardavan Pedram, Andreas Gerstlauer and Robert van de Geijn.
SBAC-PAD 2012. Accepted.
Level-3 BLAS on the TI C6678 multi-core DSP
Murtaza Ali, Eric Stotzer, Francisco D. Igual, and Robert van de Geijn.
SBAC-PAD 2012. Accepted.
Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC
Francisco D. Igual, Murtaza Ali, Arnon Friedmann, Eric Stotzer, Timothy Wentz, and Robert van de Geijn.
SC12. Accepted.
Designing Linear Algebra Algorithms by
Mechanizing the Expert Developer
Bryan Marker, Jack Poulson, Don Batory, and Robert van de Geijn
A Linear Algebra Core Design for Efficient Level-3 BLAS
Ardavan Pedram, Syed Gilani, Nam Sung Kim, Robert van de Geijn, Michael Schulte, Andreas Gerstlauer. (poster)
ASAP, 2012.
A High-Performance, Low-Power Linear Algebra Core
Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn
22rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2011), 2011
[ PDF (draft)]
Retargeting PLAPACK to Clusters with Hardware Accelerators
Manuel Fogue and Francisco D. Igual, Enrique Quintana-Orti, and Robert van de Geijn.
2010 International Conference on High Performance Computing and Simulation (HPCS 2010), 2010
Transforming Linear Algebra Libraries: From Abstraction to Parallelism
Ernie Chan, Jim Nagle, Robert van de Geijn, and Field G. Van Zee.
HIPS'10: Proceedings of Fifteenth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2010
Out-of-Core Computation of the QR Factorization on Multi-Core Processors
Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par 2009), 2009
Solving "Large" Dense Matrix Problems on Multi-Core Processors and GPUs
Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - PDSEC'09. Roma (Italia), 2009.
Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear System
Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
IEEE International Symposium on Parallel and Distributed Computing, Lisbon (Portugal), 2009.
Fast Development of Dense Linear Algebra Codes on Graphics Processors
Maria Jesus Zafont, Alberto Martin, Francisco D. Igual, and Enrique S. Quintana-Orti.
14th International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2009.
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Alfredo Remon, and Robert A. van de Geijn.
in High Performance Computing for Computational Science - VECPAR 2008,
Design of Scalable Dense Linear Algebra Libraries for Multithreaded Architectures: the LU Factorization
Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Robert van de Geijn, and Field G. Van Zee.
Workshop on Multithreaded Architectures and Applications, MTAAP 2008
Scheduling of QR factorization algorithms on SMP and multi-core architectures
Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Field G. Van Zee, and Robert A. van de Geijn.
Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008
Satisfying your Dependencies with SuperMatrix
Ernie Chan, Field G. Van Zee, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, Robert van de Geijn.
Proceedings of IEEE Cluster Computing 2007, pp. 91 - 99, Austin, Texas, September 2007.
Toward Scalable Matrix Multiply on Multithreaded Architectures
Bryan Marker, Field Van Zee, Kazushige Goto, Gregorio Quintana-Orti, Robert
van de Geijn.
Proceedings of European Conference on Parallel and Distributed Computing (EuroPar 2007), pp. 748-757, 2007.
Formal Correctness and Stability of Linear Algebra Algorithms
Paolo Bientinesi and Robert van de Geijn.
A Family of High-Performance Matrix Multiplication Algorithms
John Gunnels, Fred Gustavson, Greg Henry, and Robert A. van de Geijn,
PARA 2004, LNCS 3732, pp. 2256-265, 2005.
Rapid Development of High-Performance Linear Algebra Libraries
Paolo Bientinesi, John Gunnels, Fred Gustavson, Greg Henry, Margaret Myers,
Enrique S. Quintana-Orti, and Robert A. van de Geijn,
PARA 2004, LNCS 3732, pp. 376--384, 2005.
Automatic Derivation of Linear Algebra Algorithms with Application to Control Theory
Paolo Bientinesi, Sergey Kolos, and Robert A. van de Geijn
PARA 2004, LNCS 3732, pp. 385--394, 2005.
Rapid Development of High-Performance Out-of-Core Solvers
Thierry Joffrain, Enrique S. Quintana-Orti, and Robert A. van de Geijn.
PARA 2004, LNCS 3732, pp. 413--422, 2005.
A Family of High-Performance Matrix Algorithms
John A. Gunnels, Greg M. Henry, and Robert A. van de Geijn.
In Computational Science - 2001, Part I Lecture Notes in Computer Science 2073, pp. 51-60, Springer, 2001.
Fault-Tolerant High-Performance Matrix-Matrix Multiplication: Theory and Practice
John A. Gunnels, Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn.
International Conference for Dependable Systems and Networks (DSN-2001), pp. 47-56, July 2-4, 2001.
Formal Methods for High-Performance Linear Algebra Libraries
John Gunnels and Robert van de Geijn
The Architecture of Scientific Software: Ifip Tc2/Wg2.5 Working Conference on the Architecture of Scientific Software, October 2-4, 2000, Ottawa, Canada(Ronald F. Boisvert and P. T. Tang, editors), pp. 193-210, Kluwer Academic Press, 2001
Deriving Correct High-Performance Algorithms
Devangi N. Parikh, Maggie E. Myers, Robert A. van de Geijn
FLAME Working Note #86,
The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-07. 2017.
Strassen's Algorithm for Tensor Contraction
Jianyu Huang, Devin A. Matthews, and Robert A. van de Geijn.
FLAME Working Note #84,
The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-02. April 3, 2017.
BLISlab: A Sandbox for Optimizing GEMM
Jianyu Huang and Robert A. van de Geijn.
FLAME Working Note #80,
The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-13. August 31, 2016.
Implementing Strassen's Algorithm with BLIS.
Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn.
FLAME Working Note #79,
The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-03. April 16, 2016.
Toward ABFT for BLIS GEMM.
Tyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy,
Enrique S. Quintana-Orti.
FLAME Working Note #76,
The University of Texas at Austin, Department of Computer
Science. Report TR-15-05. Originally published June 13, 2015 and revised Nov. 5, 2015.
Opportunities for Parallelism in Matrix Multiplication
Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee.
FLAME Working Note #71. The University of Texas at Austin, Department
of Computer Science. Technical Report TR-13-20. 2013.
To appear as:
Anatomy of High-Performance Many-Threaded Matrix Multiplication
Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee.
International Parallel and Distributed Processing Symposium 2014.
Implementing Level-3 BLAS with BLIS: Early Experience
Field G. Van Zee,
Tyler Smith,
Francisco D. Igual,
Mikhail Smelyanskiy,
Xianyi Zhang,
Michael Kistler,
Vernon Austel,
John Gunnels,
Tze Meng Low,
Bryan Marker,
Lee Killough,
Robert A. van de Geijn.
FLAME Working Note #69. The University of Texas at Austin, Department of Computer Science. Technical Report TR-13-03. 2013.
BLIS: A Framework for Rapid Instantiation of BLAS Functionality
Field G. Van Zee, Robert A. van de Geijn.
ACM Transactions on Mathematical Software
Parallel Matrix Multiplication: 2D and 3D
Martin Schatz, Jack Poulson, and Robert van de Geijn.
FLAME Working Note #62. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-13. June 2012.
Unleashing DSPs for General-Purpose HPC.
Francisco D. Igual, Murtaza Ali, Arnon Friedmann, Eric Stotzer, Timothy Wentz, and Robert van de Geijn.
FLAME Working Note #61. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-02. February 2012.
Mechanizing the Expert Dense Linear Algebra Developer.
Bryan Marker, Andy Terrel, Jack Poulson, Don Batory, and Robert van de Geijn.
FLAME Working Note #58. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-18. April 2011. (Refined paper submitted to PPoPP'12.)
Deriving Linear Algebra Libraries.
Robert van de Geijn, Tyler Rhodes, Maggie Myers, and Field Van Zee.
FLAME Working Note #57. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-09. March 2011. (Submitted to FAC.)
Architecture Design by Transformation
Taylor L. Riche, Don Batory, Rui Goncalves, Bryan Marker.
FLAME Working Note #54. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-39. Dec. 14, 2010.
Algorithms for Reducing a Matrix to Condensed Form.
Field G. Van Zee, Robert van de Geijn, Gregorio Quintana-Orti, and G. Joseph Elizondo.
FLAME Working Note #53. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-37. Oct. 29, 2010.
Level-3 BLAS on a GPU: Picking the Low Hanging Fruit
Francisco D. Igual, Gregorio Quintana-Orti, and Robert van de Geijn.
FLAME Working Note #37. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report DICC 2009-04-01. April 30, 2009, Updated May 21, 2009.
FLAMES2S: From Abstraction to High Performance.
Richard Veras, Jonathan Monette, Enrique Quintana-Orti, and Robert van de Geijn.
FLAME Working Note #35. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-49. Dec. 14, 2008.
Beautiful Parallel Code: Evolution vs. Intelligent Design.
Robert van de Geijn.
Presented at Supercomputing 2008 Workshop on Node Level Parallelism for Large Scale Supercomputers, Austin, Texas, November 2008. FLAME Working Note #34. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-46. Nov. 21, 2008.
SuperMatrix for the Factorization of Band Matrices.
Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Alfredo Remon, Robert van de Geijn.
FLAME Working Note #27. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-07-51. September 24, 2007.
Improving the Performance of Reduction to Hessenberg Form.
Gregorio Quintana-Orti and Robert van de Geijn.
FLAME Working Note #14. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-44. Oct 2004.
On Accumulating Householder Transformations.
Thierry Joffrain, Tze Meng Low, Enrique S. Quintana-Orti, Robert van de Geijn, and Field Van Zee.
FLAME Working Note #13. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-43. Oct 2004.
FLAME@lab: A Farewell to Indices.
Paolo Bientinesi, Enrique S. Quintana-Orti, and Robert van de Geijn.
FLAME Working Note #11. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2003-11. April 2003.
Representing Linear Algebra Algorithms in Code: The FLAME API.
Robert A. van de Geijn.
FLAME Working Note #10. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2003-01. Jan. 2003.
On Reducing TLB Misses in Matrix Multiplication.
Kazushige Goto and Robert van de Geijn.
FLAME Working Note #9. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2002-55. Nov. 2002.
The Science of Deriving Dense Linear Algebra Algorithms.
Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert van de Geijn.
FLAME Working Note #8. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2002-53. Sept. 2002.
Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code.
Greg M. Henry.
FLAME Working Note #7. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-46. Dec. 2001.
Formal Derivation of Algorithms: The Triangular Sylvester Equation.
Enrique S. Quintana-Orti and Robert van de Geijn.
FLAME Working Note #5. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-35. Sept. 2001.
High-Performance Matrix Multiplication Algorithms for Architectures with Hierarchical Memories.
John Gunnels, Greg Henry, and Robert van de Geijn.
FLAME Working Note #4. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-22. June 2001.
Developing Linear Algebra Algorithms: A Collection of Class Projects.
John Gunnels and Robert van de Geijn.
FLAME Working Note #3. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-19. May 2001.
Fault-Tolerant High-Performance Matrix-Matrix Multiplication,
John A. Gunnels, Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn.
FLAME Working Note #2. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2000-34. December 2000.
Formal Linear Algebra Methods Environment (FLAME): Overview.
John Gunnels, Greg Henry, and Robert van de Geijn.
FLAME Working Note #1. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2000-28. November 2000.
