libflame is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:
We strongly encourage our users to refer to the latest copy of the libflame user's guide for installation instructions and API reference.
FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libflame we denote the library that has resulted from this project. For addition information, visit the FLAME home page.
The following libflame features benefit both basic and advanced users, as well as library developers:
Figure 1: Blocked Cholesky Factorization (variant 2) expressed as a FLAME algorithm.
FLA_Error FLA_Chol_l_blk_var2( FLA_Obj A, int nb_alg ) { FLA_Obj ATL, ATR, A00, A01, A02, ABL, ABR, A10, A11, A12, A20, A21, A22; int b, value = 0; FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){ b = min( FLA_Obj_length( ABR ), nb_alg ); FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ************* */ /* ******************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, b, b, FLA_BR ); /* ---------------------------------------------------------------- */ FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_MINUS_ONE, A10, FLA_ONE, A11 ); FLA_Gemm( FLA_NO_TRANSPOSE, FLA_TRANSPOSE, FLA_MINUS_ONE, A20, A10, FLA_ONE, A21 ); value = FLA_Chol_unb_external( FLA_LOWER_TRIANGULAR, A11 ); if ( value != FLA_SUCCESS ) return ( FLA_Obj_length( A00 ) + value ); FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR, FLA_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A21 ); /* ---------------------------------------------------------------- */ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ************** */ /* ****************** */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } return value; } |
SUBROUTINE DPOTRF( UPLO, N, A, LDA, INFO ) CHARACTER UPLO INTEGER INFO, LDA, N DOUBLE PRECISION A( LDA, * ) DOUBLE PRECISION ONE PARAMETER ( ONE = 1.0D+0 ) LOGICAL UPPER INTEGER J, JB, NB LOGICAL LSAME INTEGER ILAENV EXTERNAL LSAME, ILAENV EXTERNAL DGEMM, DPOTF2, DSYRK, DTRSM, XERBLA INTRINSIC MAX, MIN INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN INFO = -1 ELSE IF( N.LT.0 ) THEN INFO = -2 ELSE IF( LDA.LT.MAX( 1, N ) ) THEN INFO = -4 END IF IF( INFO.NE.0 ) THEN CALL XERBLA( 'DPOTRF', -INFO ) RETURN END IF INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( N.EQ.0 ) $ RETURN NB = ILAENV( 1, 'DPOTRF', UPLO, N, -1, -1, -1 ) IF( NB.LE.1 .OR. NB.GE.N ) THEN CALL DPOTF2( UPLO, N, A, LDA, INFO ) ELSE IF( UPPER ) THEN *********** Upper triangular case omited for purposes of fair comparison. ELSE DO 20 J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DSYRK( 'Lower', 'No transpose', JB, J-1, -ONE, $ A( J, 1 ), LDA, ONE, A( J, J ), LDA ) CALL DPOTF2( 'Lower', JB, A( J, J ), LDA, INFO ) IF( INFO.NE.0 ) $ GO TO 30 IF( J+JB.LE.N ) THEN CALL DGEMM( 'No transpose', 'Transpose', N-J-JB+1, JB, $ J-1, -ONE, A( J+JB, 1 ), LDA, A( J, 1 ), $ LDA, ONE, A( J+JB, J ), LDA ) CALL DTRSM( 'Right', 'Lower', 'Transpose', 'Non-unit', $ N-J-JB+1, JB, ONE, A( J, J ), LDA, $ A( J+JB, J ), LDA ) END IF 20 CONTINUE END IF END IF GO TO 40 30 CONTINUE INFO = INFO + J - 1 40 CONTINUE RETURN END |
Figure 2: FLAME/C code for algorithm shown in Figure 1 (left) representing the style of coding found in libflame, and Fortran-77 LAPACK code (right) implementing the same algorithm.
Figure 3: Cholesky Factorization implementations compared on an 8-core Opteron system. Notes: For FLAME experiments, LAPACK was used only for the small unblocked Cholesky subproblem. GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. Peak system performance is 38.4 GFLOPS.
Figure 4: Cholesky Factorization implementations compared on a 16 core Itanium2 system. Notes: libflame uses variant 3 while LAPACK uses variant 2. For non-SuperMatrix experiments, GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. For SuperMatrix experiments, GotoBLAS parallelism was disabled. Theoretical peak system performance is 96 GFLOPS.
libflame contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implemenations support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libflame and also provides their corresponding netlib name for reference.
Notes:
operation name |
netlib routine name |
libflame routine name |
FLAME/C |
FLASH |
SuperMatrix |
type support |
l2f support |
---|---|---|---|---|---|---|---|
libflame routine prefix |
FLA_ |
FLASH_* |
FLASH_ |
||||
Level-3 BLAS |
|||||||
general matrix-matrix multiply |
?gemm |
Gemm |
y |
y |
y |
sdcz |
N/A |
hermitian matrix-matrix multiply |
?hemm |
Hemm |
y |
y |
y |
sdcz |
N/A |
hermitian rank-k update |
?herk |
Herk |
y |
y |
y |
sdcz |
N/A |
hermitian rank-2k update |
?her2k |
Her2k |
y |
y |
y |
sdcz |
N/A |
symmetric matrix-matrix multiply |
?symm |
Symm |
y |
y |
y |
sdcz |
N/A |
symmetric rank-k update |
?syrk |
Syrk |
y |
y |
y |
sdcz |
N/A |
symmetrix rank-2k update |
?syr2k |
Syr2k |
y |
y |
y |
sdcz |
N/A |
triangular matrix-matrix multiply |
?trmm |
Trmm |
y |
y |
y |
sdcz |
N/A |
triangular solve with multiple right-hand sides |
?trsm |
Trsm |
y |
y |
y |
sdcz |
N/A |
LAPACK |
|||||||
Cholesky factorization |
?potrf |
Chol |
y |
y |
y |
sdcz |
sdcz |
LU factorization with no pivoting |
~ |
LU_nopiv |
y |
y |
y |
sdcz |
N/A |
LU factorization with partial pivoting |
?getrf |
LU_piv |
y |
y |
y |
sdcz |
sdcz |
LU factorization with incremental pivoting |
~ |
LU_incpiv |
y |
y |
sdcz |
N/A |
|
QR factorization via the UT transform |
~ |
QR_UT |
y |
sdcz |
sdcz |
||
QR factorization (incremental) via the UT transform |
~ |
QR_UT_inc |
y |
y |
sdcz |
N/A |
|
Apply Q from QR_UT or LQ_UT to a right-hand side matrix |
~ |
Apply_Q_UT |
y |
sdcz |
sdcz |
||
Apply Q from QR_UT_inc to a right-hand side matrix |
~ |
Apply_Q_UT_inc |
y |
y |
sdcz |
N/A |
|
LQ factorization via the UT transform |
~ |
LQ_UT |
y |
sdcz |
sdcz |
||
Trinagular matrix inversion |
?trtri |
Trinv |
y |
y |
y |
sdcz |
sdcz |
triangular transpose matrix-matrix multiply |
?laaum |
Ttmm |
y |
y |
y |
sdcz |
sdcz |
SPD matrix inversion |
?dpotri + |
SPDinv |
y |
y |
y |
sdcz |
sdcz |
Triangular Sylvester equation solve |
?trsyl ^ |
Sylv |
y |
y |
y |
sdcz |
sdcz |
We provide an interface, lapack2flame, which allows legacy codes that link to LAPACK to utilize libflame without any code changes. However, lapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.
Please see the libflame user's guide for the latest system requirements for both GNU/Linux and UNIX, and Windows platforms.
Please see the libflame user's guide for the latest instructions on downloading, configuring, compiling, and installing libflame.
The developers of libflame enthusiastically encourage users to use the GotoBLAS implementation of the Basic Linear Algebra Subprograms (BLAS). To obtain the source code for GotoBLAS, please visit the Texas Advanced Computing Center software site. After downloading perform the following steps:
tar xzf GotoBLAS-1.22.tar.gz
cd GotoBLAS
Please read the documentation that accompanies the GotoBLAS source.
Most users may build the GotoBLAS library by running quickbuild.32bit or quickbuild.64bit. Alternately, advanced users may instead view and edit Makefile.rule and then execute:
make lib
ln -s libgoto_ITANIUM2-r1.10.a libgoto.a
ln -s libgoto_ITANIUM2-r1.10.a libgoto_ia64.a
We highly recommend using libflame with GotoBLAS! However, libflame will work with any BLAS library. If you want to use libflame with a different BLAS, use the configure-time option --disable-goto-interfaces before building libflame. If you have further questions about interfacing libflame with your preferred BLAS library, contact flame@cs.utexas.edu.
Please see the libflame user's guide for the latest instructions on linking your legacy, LAPACK-dependent application to libflame.
We offer a step-by-step walkthrough for running two example programs included in the libflame source distribution: the first executes a sequential Cholesky factorization with conventional ("flat") matrix storage; the second executes a multithreaded Cholesky factorization using SuperMatrix and hierarchical storage.
We also encourage potential users to browse the code examples provided at our linear algebra wiki.
We have functionality beyond LAPACK. For example, we have routines for updating an LU factorization with pivoting. Adding additional operations is not our top priority at the moment. However, if you have an operation that you would like to see supported, it doesn't hurt to contact us with your request!
We are very insecure people. So, if you like the libraries and find them useful, send us a message! We even make it easy. In the top-level directory of the libflame distribution, execute:
make send-thanks
This will automatically e-mail us a message!
Contact flame@cs.utexas.edu.
Last Updated on 29 September 2009 by Field G. Van Zee.