What is libFLAME?
libFLAME is a high performance dense linear algebra library that is
the result of the FLAME methodology for systematically developing
dense linear algebra libraries. The FLAME methodology is radically
different from the LINPACK/LAPACK approach that dates back to the
1970s. For more information about the methodology, visit the Methodology page.
The source code for libflame is now hosted via
github.
We recommend using:
git clone https://github.com/flame/libflame.git
to download the source code. This will allow you to easily keep your
local clone up-to-date as the github source code is updated.
So, after you have created your local clone, you can simply run:
git pull
to fetch and merge the latest changes into your local copy of libflame.
Users may post questions and comments about libflame to the
libflame-discuss
mailing list.
(For now, developers are also encouraged to use this list to communicate
with one another.)
What's provided by libFLAME?
The following libflame features benefit both basic and advanced users,
as well as library developers:
- A solution based on fundamental
computer science. The FLAME project advocates a new approach to
developing linear algebra libraries. Algorithms are obtained
systematically according to rigorous principles of formal
derivation. These methods are based on fundamental theorems of
computer science to guarantee that the resulting algorithm is also
correct. In addition, the FLAME methodology uses a new, more stylized
notation for expressing loop-based linear algebra algorithms. This
notation closely resembles how algorithms are naturally illustrated
with pictures.
- Object-based abstractions and API. The BLAS, LAPACK, and ScaLAPACK
projects place backward compatibility as a high priority, which
hinders progress towards adopting modern software engineering
principles such as object abstraction. libflame is built around opaque
structures that hide implementation details of matrices, such as
leading dimensions, and exports object-based programming interfaces to
operate upon these structures. Likewise, FLAME algorithms are
expressed (and coded) in terms of smaller operations on sub-partitions
of the matrix operands. This abstraction facilitates programming
without array or loop indices, which allows the user to avoid painful
index-related programming errors altogether.
This similarity is
quite intentional, as it preserves the clarity of the original
algorithm as it would be illustrated on a white-board or in a
publication.
-
Educational value. Aside from the potential to introduce
students to formal algorithm derivation, FLAME serves as an excellent
vehicle for teaching linear algebra algorithms in a classroom
setting. The clean abstractions afforded by the API also make FLAME
ideally suited for instruction of high-performance linear algebra
courses at the undergraduate and graduate level. Robert van de Geijn
routinely uses FLAME in his linear algebra and numerical analysis
courses. Some colleagues of the FLAME project are even beginning to
use the notation to teach classes elsewhere around the country,
including Timothy Mattson of Intel Corporation. Historically, the
BLAS/LAPACK style of coding has been used in these settings. However,
coding in this manner tends to obscure the algorithms; students often
get bogged down debugging the frustrating errors that often result
from indexing directly into arrays that represent the matrices.
- A complete dense linear algebra framework. Like LAPACK,
libflame provides ready-made implementations of common linear algebra
operations. The implementations found in libflame mirror many of those
found in the BLAS and LAPACK packages. However, unlike LAPACK,
libflame provides a framework for building complete custom linear
algebra codes. We believe such an environment is more useful as it
allows the user to quickly prototype a linear algebra solution to fit
the needs of his application. We are currently writing a complete
user's guide for libflame. In the meantime, users may browse the full
list of routines available in libflame through our online doxygen
documentation.
- High performance. In our
publications and performance graphs, we do our best to dispel the myth
that user- and programmer-friendly linear algebra codes cannot yield
high performance. Our FLAME implementations of operations such as
Cholesky factorization and Triangular Inversion often outperform the
corresponding implementations available in the LAPACK library.
Many instances of the libflame
performance advantage result from the fact that LAPACK provides only
one variant (algorithm) of every operation, while libflame provides
all known variants. This allows the user and/or library developer to
choose which algorithmic variant is most appropriate for a given
situation. libflame relies only on the presence of a core set of
highly optimized unblocked routines to perform the small sub-problems
found in FLAME algorithm codes.
-
Dependency-aware multithreaded
parallelism. Until recently, the authors of the BLAS and LAPACK
advocated getting shared-memory parallelism from LAPACK routines by
simply linking to multithreaded BLAS. This low-level solution requires
no changes to LAPACK code but also suffers from sharp limitations in
terms of efficiency and scalability for small- and medium-sized matrix
problems. The fundamental bottleneck to introducing parallelism
directly within many algorithms is the web of data dependencies that
inevitably exists between sub-problems. The libflame project has
developed a runtime system, SuperMatrix, to detect and analyze
dependencies found within FLAME algorithms-by-blocks (algorithms whose
sub-problems operate only on block operands). Once dependencies are
known, the system schedules sub-operations to independent threads of
execution. This system is completely abstracted from the algorithm
that is being parallelized and requires virtually no change to the
algorithm code, but at the same time exposes abundant high-level
parallelism. We have observed that this method provides increased
performance for a range of small- and medium-sized problems. The most recent version of LAPACK does not offer any
similar mechanism.
- Support for hierarchical
storage-by-blocks. Storing matrices by blocks, a concept advocated
years ago by Fred Gustavson of IBM, often yields performance gains
through improved spatial locality. Instead of representing matrices as
a single linear array of data with a prescribed leading dimension as
legacy libraries require (for column- or row-major order), the storage
scheme is encoded into the matrix object. Here, internal elements
refer recursively to child objects that represent
sub-matrices. Currently, libflame provides a subset of the
conventional API that supports hierarchical matrices, allowing users
to create and manage such matrix objects as well as convert between
storage-by-blocks and conventional "flat" storage schemes.
- Advanced
build system. From its early revisions, libflame distributions have
been bundled with a robust build system, featuring automatic makefile
creation and a configuration script conforming to GNU standards
(allowing the user to run the ./configure; make; make install sequence
common to many open source software projects). Without any user input,
the configure script searches for and chooses compilers based on a
pre-defined preference order for each architecture. The user may
request specific compilers via the configure interface, or enable
other non-default features of libflame such as custom memory
alignment, multithreading (via POSIX threads or OpenMP), compiler
options (debugging symbols, warnings, optimizations), and memory leak
detection. The reference BLAS and LAPACK libraries provide no
configuration support and require the user to manually modify a
makefile with appropriate references to compilers and compiler options
depending on the host architecture.
- Windows support. While libflame
was originally developed for GNU/Linux and UNIX environments, we have
in the course of its development had the opportunity to port the
library to Microsoft Windows. The Windows port features a separate
build system implemented with Python and nmake, the Microsoft analogue
to the make utility found in UNIX-like environments. As of this
writing, the port is still very new and therefore should be considered
experimental. However, we feel libflame for Windows is very close to
usable for many in our audience, particularly those who consider
themselves experts. We invite interested users to try the software
and, of course, we welcome feedback to help improve our Windows
support, and libflame in general.
- Independence from Fortran and
LAPACK. The libflame development team is pleased to offer a
high-performance linear algebra solution that is 100%
Fortran-free. libflame is a C-only implementation and does not depend
on any external Fortran libraries, such as LAPACK. That said, we
happily provide an optional backward compatibility layer,
lapack2flame, that maps legacy LAPACK routine invocations to their
corresponding native C implementations in libflame. This allows legacy
applications to start taking advantage of libflame with virtually no
changes to their source code. Furthermore, we understand that some
users wish to leverage highly-optimized implementations that conform
to the LAPACK interface, such as Intel's Math Kernel Library (MKL). As
such, we allow those users to configure libflame such that their
external LAPACK implementation is called for the small,
performance-sensitive unblocked subproblems that arise within
libflame's blocked algorithms and algorithms-by-blocks.