libFLAME is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:
FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libFLAME we denote the library that has resulted from this project. For addition information, visit the FLAME home page.
The following libFLAME features benefit both basic and advanced users, as well as library developers:
Figure 1: Blocked Cholesky Factorization (variant 2) expressed as a FLAME algorithm.
FLA_Error FLA_Chol_l_blk_var2( FLA_Obj A, int nb_alg ) { FLA_Obj ATL, ATR, A00, A01, A02, ABL, ABR, A10, A11, A12, A20, A21, A22; int b, value = 0; FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){ b = min( FLA_Obj_length( ABR ), nb_alg ); FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ************* */ /* ******************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, b, b, FLA_BR ); /* ---------------------------------------------------------------- */ FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_MINUS_ONE, A10, FLA_ONE, A11 ); FLA_Gemm( FLA_NO_TRANSPOSE, FLA_TRANSPOSE, FLA_MINUS_ONE, A20, A10, FLA_ONE, A21 ); value = FLA_Chol_unb_external( FLA_LOWER_TRIANGULAR, A11 ); if ( value != FLA_SUCCESS ) return ( FLA_Obj_length( A00 ) + value ); FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR, FLA_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A21 ); /* ---------------------------------------------------------------- */ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ************** */ /* ****************** */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } return value; } |
SUBROUTINE DPOTRF( UPLO, N, A, LDA, INFO ) CHARACTER UPLO INTEGER INFO, LDA, N DOUBLE PRECISION A( LDA, * ) DOUBLE PRECISION ONE PARAMETER ( ONE = 1.0D+0 ) LOGICAL UPPER INTEGER J, JB, NB LOGICAL LSAME INTEGER ILAENV EXTERNAL LSAME, ILAENV EXTERNAL DGEMM, DPOTF2, DSYRK, DTRSM, XERBLA INTRINSIC MAX, MIN INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN INFO = -1 ELSE IF( N.LT.0 ) THEN INFO = -2 ELSE IF( LDA.LT.MAX( 1, N ) ) THEN INFO = -4 END IF IF( INFO.NE.0 ) THEN CALL XERBLA( 'DPOTRF', -INFO ) RETURN END IF INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( N.EQ.0 ) $ RETURN NB = ILAENV( 1, 'DPOTRF', UPLO, N, -1, -1, -1 ) IF( NB.LE.1 .OR. NB.GE.N ) THEN CALL DPOTF2( UPLO, N, A, LDA, INFO ) ELSE IF( UPPER ) THEN *********** Upper triangular case omited for purposes of fair comparison. ELSE DO 20 J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DSYRK( 'Lower', 'No transpose', JB, J-1, -ONE, $ A( J, 1 ), LDA, ONE, A( J, J ), LDA ) CALL DPOTF2( 'Lower', JB, A( J, J ), LDA, INFO ) IF( INFO.NE.0 ) $ GO TO 30 IF( J+JB.LE.N ) THEN CALL DGEMM( 'No transpose', 'Transpose', N-J-JB+1, JB, $ J-1, -ONE, A( J+JB, 1 ), LDA, A( J, 1 ), $ LDA, ONE, A( J+JB, J ), LDA ) CALL DTRSM( 'Right', 'Lower', 'Transpose', 'Non-unit', $ N-J-JB+1, JB, ONE, A( J, J ), LDA, $ A( J+JB, J ), LDA ) END IF 20 CONTINUE END IF END IF GO TO 40 30 CONTINUE INFO = INFO + J - 1 40 CONTINUE RETURN END |
Figure 2: FLAME/C code for algorithm shown in Figure 2 (left), representing the style of coding found in libFLAME, and Fortran-77 LAPACK code (right) implementing the same algorithm.
Figure 3: Cholesky Factorization implementations compared on an 8-core Opteron system. Notes: For FLAME experiments, LAPACK was used only for the small unblocked Cholesky subproblem. GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. Peak system performance is 38.4 GFLOPS.
Figure 4: Cholesky Factorization implementations compared on a 16 core Itanium2 system. Notes: libFLAME uses variant 3 while LAPACK uses variant 2. For non-SuperMatrix experiments, GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. For SuperMatrix experiments, GotoBLAS parallelism was disabled. Theoretical peak system performance is 96 GFLOPS.
libFLAME contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implemenations support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libFLAME and also provides their corresponding netlib name for reference.
Notes:
operation name |
netlib routine name |
libFLAME routine name |
FLAME/C |
FLASH |
SuperMatrix |
type support |
l2f support |
---|---|---|---|---|---|---|---|
libFLAME routine prefix |
FLA_ |
FLASH_* |
FLASH_ |
||||
Level-3 BLAS |
|||||||
general matrix-matrix multiply |
?gemm |
Gemm |
y |
y |
y |
sdcz |
N/A |
hermitian matrix-matrix multiply |
?hemm |
Hemm |
y |
y |
y |
sdcz |
N/A |
hermitian rank-k update |
?herk |
Herk |
y |
y |
y |
sdcz |
N/A |
hermitian rank-2k update |
?her2k |
Her2k |
y |
y |
y |
sdcz |
N/A |
symmetric matrix-matrix multiply |
?symm |
Symm |
y |
y |
y |
sdcz |
N/A |
symmetric rank-k update |
?syrk |
Syrk |
y |
y |
y |
sdcz |
N/A |
symmetrix rank-2k update |
?syr2k |
Syr2k |
y |
y |
y |
sdcz |
N/A |
triangular matrix-matrix multiply |
?trmm |
Trmm |
y |
y |
y |
sdcz |
N/A |
triangular solve with multiple right-hand sides |
?trsm |
Trsm |
y |
y |
y |
sdcz |
N/A |
LAPACK |
|||||||
triangular transpose matrix-matrix multiply |
?laaum |
Ttmm |
y |
y |
y |
sdcz |
sdcz |
Cholesky factorization |
?potrf |
Chol |
y |
y |
y |
sdcz |
sdcz |
LU factorization with no pivoting |
~ |
LU_nopiv |
y |
y |
y |
sdcz |
sdcz |
LU factorization with partial pivoting |
?getrf |
LU_piv |
y |
sdcz |
sdcz |
||
QR factorization |
?geqrf |
QR |
y |
sd |
d |
||
QR factorization via the UT transform |
~ |
QR_UT |
y |
sd |
d |
||
LQ factorization |
?gelqf |
LQ |
y |
sd |
d |
||
LQ factorization via the UT transform |
~ |
LQ_UT |
y |
sd |
d |
||
Reduction to upper Hessenberg form |
?gehrd |
Hess |
y |
d |
d |
||
Trinagular matrix inversion |
?trtri |
Trinv |
y |
y |
y |
sdcz |
sdcz |
SPD matrix inversion |
?dpotri + |
SPDinv |
y |
y |
y |
sdcz |
sdcz |
Triangular Sylvester equation solve |
?trsyl ^ |
Sylv |
y |
y |
y |
sdcz |
sdcz |
We provide an interface, liblapack2flame, which allows legacy codes that link to LAPACK to utilize libFLAME without any code changes. However, liblapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.
In addition, liblapack2flame provides some interfaces to some routines which are dependent upon the above operations. An incomplete list of these operations is:
dgees, dgeesx, dgeev, dgeevx, dggev, dggevx, dgelq2, dgeqp3, dgeqr2, dggqrf, dggrqf, dgesdd, dgesvd, dposv, dposvx, dsygvd, dsygv, dsygvx, dgegs, dgegv, dgges, dggesx, dggglm, dgglse, dgelsy, dgelsd, dgelss
Before you attempt to build libFLAME, be sure you have the following software tools:
After downloading the software, you may proceed to build and install the libraries by performing the following steps. (Note here we assume you're building from a libflame 2.0 tarball.)
tar xzf libflame-2.0.tar.gz
cd libflame-2.0
Configure the library. Please run ./configure --help for the full range configure options.
./configure --prefix=<install_prefix>Alternatively, you may edit and run the configure wrapper in run-conf/run-configure.sh. Note that specifying the install prefix is optional. If it is omitted, the default is $HOME/flame (which we generally recommend).
Compile the source code.
make -j nThe -j option is optional. When building libFLAME on an SMP or multicore system, you may effectively parallelize the compilation process by specifying an argument n greater than 1. In this case, make spawns n processes, allowing it to compile up to n files simultaneously.
Install the library archive files to <install_prefix> ($HOME/flame by default).
make install
At this point, the libFLAME libraries have been installed into the
lib subdirectory of
<install_prefix>.
We recommend symbolically linking the libraries to abbreviated names
that do not contain the version. In addition, you might also omit the
architecture from the symbolic link name if you will only be linking
code for one architecture. This can be done manually, or with the help
of some optional post-installation make targets. Execute
make install-symlinks
to create symbolic links that omit both version and architecture strings
from the symbolic link name, or
make install-symlinks-with-arch
to create links that omit the version but contain an architecture string. This allows one to distinguish among libraries compiled for different architectures.
In your application's makefile, refer to the symbolic link. When it comes time to install an updated version of libFLAME, you need only update the symbolic links (ie: execute make install-symlinks) to the FLAME libraries rather than the makefiles of the programs that reference them.
If you are interested in configuring libFLAME with non-default options, please see the output of configure --help. We've summarized the most commonly used configure options here:
option |
description |
default |
---|---|---|
--enable-optimizations |
Employ traditional compiler optimizations when compiling C and Fortran source code. |
Enabled |
--enable-warnings |
Use the appropriate flag(s) to request warnings when compiling C and Fortran source code. |
Enabled |
--enable-debug |
Use the appropriate debug flag (usually -g) when compiling C and Fortran source code. |
Disabled |
--enable-builtin-lapack-routines |
Build and include into libFLAME blocked and unblocked LAPACK routines for all operations supported within libFLAME. When this option is disabled, LAPACK is required at link-time. Note that FLAME implementations of LAPACK operations (such as Cholesky, LU, and QR Factorizations) only use LAPACK code for their unblocked subproblems, though libFLAME also includes wrappers to external blocked implementations for reference testing. Enabling this option is useful when a user is setting up libFLAME for the first time and does not want to build LAPACK from source and has no intention of using a third-party library, such as MKL, to provide basic LAPACK functionality. |
Disabled |
--enable-goto-interfaces |
Enable code that interfaces with internal/low-level libgoto functionality, such as those symbols that may be queried for architecture-dependent blocksize values. |
Enabled |
--enable-supermatrix |
Enable Ernie Chan's dependency-aware task scheduling and parallel execution system. |
Disabled |
--enable-multithreading=model |
Enable multithreading support. Valid values for model are pthreads and openmp. Threading must be enabled to access SMP/multicore parallelized implementations. |
Disabled |
--enable-memory-alignment=N |
Enable code that aligns dynamically allocated memory regions at N-byte boundaries. Note: N must be a power of two and multiple of sizeof(void*), which is usually 4 on 32-bit architectures and 8 on 64-bit architectures. |
Disabled |
--enable-internal-error-checking |
Enable internal runtime consistency checks of function parameters and return values. |
Enabled |
--enable-memory-counter |
Enable code that keeps track of the balance between calls to FLA_malloc() and FLA_free(). Upon calling FLA_Finalize(), the counter value is output to standard error. |
Disabled |