PaStiX Handbook
6.4.0

PastixComplex32 BLAS kernels  
void  core_cplrnt (int m, int n, pastix_complex32_t *A, int lda, int gM, int m0, int n0, unsigned long long int seed) 
Generate a random tile. More...  
void  core_cgetmo (int m, int n, const pastix_complex32_t *A, int lda, pastix_complex32_t *B, int ldb) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn. More...  
int  core_cgeadd (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, pastix_complex32_t alpha, const pastix_complex32_t *A, pastix_int_t LDA, pastix_complex32_t beta, pastix_complex32_t *B, pastix_int_t LDB) 
Add two matrices together. More...  
int  core_cgemdm (pastix_trans_t transA, pastix_trans_t transB, int M, int N, int K, pastix_complex32_t alpha, const pastix_complex32_t *A, int LDA, const pastix_complex32_t *B, int LDB, pastix_complex32_t beta, pastix_complex32_t *C, int LDC, const pastix_complex32_t *D, int incD, pastix_complex32_t *WORK, int LWORK) 
Perform one of the following matrixmatrix operations. More...  
int  core_cpqrcp (float tol, pastix_int_t maxrank, int full_update, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex32_t *tau, pastix_complex32_t *work, pastix_int_t lwork, float *rwork) 
Compute a rankreavealing QR factorization. More...  
int  core_crqrcp (float tol, pastix_int_t maxrank, int refine, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex32_t *tau, pastix_complex32_t *work, pastix_int_t lwork, float *rwork) 
Compute a randomized QR factorization. More...  
int  core_crqrrt (float tol, pastix_int_t maxrank, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_complex32_t *tau, pastix_complex32_t *B, pastix_int_t ldb, pastix_complex32_t *tau_b, pastix_complex32_t *work, pastix_int_t lwork, float normA) 
Compute a randomized QR factorization with rotation technique. More...  
int  core_ctqrcp (float tol, pastix_int_t maxrank, int unused, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex32_t *tau, pastix_complex32_t *work, pastix_int_t lwork, float *rwork) 
Compute a randomized QR factorization with truncated updates. More...  
int  core_ctradd (pastix_uplo_t uplo, pastix_trans_t trans, pastix_int_t M, pastix_int_t N, pastix_complex32_t alpha, const pastix_complex32_t *A, pastix_int_t LDA, pastix_complex32_t beta, pastix_complex32_t *B, pastix_int_t LDB) 
Add two triangular matrices together as in PBLAS pctradd. More...  
int  core_cscalo (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, const pastix_complex32_t *A, pastix_int_t lda, const pastix_complex32_t *D, pastix_int_t ldd, pastix_complex32_t *B, pastix_int_t ldb) 
Scale a matrix by a diagonal out of place. More...  
PastixComplex32 Othogonalization kernels for lowrank updates  
pastix_fixdbl_t  core_clrorthu_fullqr (pastix_int_t M, pastix_int_t N, pastix_int_t rank, pastix_complex32_t *U, pastix_int_t ldu, pastix_complex32_t *V, pastix_int_t ldv) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR. More...  
pastix_fixdbl_t  core_clrorthu_partialqr (pastix_int_t M, pastix_int_t N, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, pastix_complex32_t *U, pastix_int_t ldu, pastix_complex32_t *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR. More...  
pastix_fixdbl_t  core_clrorthu_cgs (pastix_int_t M1, pastix_int_t N1, pastix_int_t M2, pastix_int_t N2, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, pastix_complex32_t *U, pastix_int_t ldu, pastix_complex32_t *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS. More...  
PastixComplex32 LAPACK kernels  
void  core_cpotrfsp (pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t . More...  
void  core_cpxtrfsp (pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting LL^t factorization of the matrix nbyn A = L * L^t . More...  
void  core_cgetrfsp (pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U. More...  
void  core_csytrfsp (pastix_int_t n, pastix_complex32_t *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t. More...  
PastixFloat BLAS kernels  
void  core_splrnt (int m, int n, float *A, int lda, int gM, int m0, int n0, unsigned long long int seed) 
Generate a random tile. More...  
void  core_sgetmo (int m, int n, const float *A, int lda, float *B, int ldb) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn. More...  
int  core_sgeadd (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, float alpha, const float *A, pastix_int_t LDA, float beta, float *B, pastix_int_t LDB) 
Add two matrices together. More...  
int  core_sgemdm (pastix_trans_t transA, pastix_trans_t transB, int M, int N, int K, float alpha, const float *A, int LDA, const float *B, int LDB, float beta, float *C, int LDC, const float *D, int incD, float *WORK, int LWORK) 
Perform one of the following matrixmatrix operations. More...  
int  core_spqrcp (float tol, pastix_int_t maxrank, int full_update, pastix_int_t nb, pastix_int_t m, pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *jpvt, float *tau, float *work, pastix_int_t lwork, float *rwork) 
Compute a rankreavealing QR factorization. More...  
int  core_srqrcp (float tol, pastix_int_t maxrank, int refine, pastix_int_t nb, pastix_int_t m, pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *jpvt, float *tau, float *work, pastix_int_t lwork, float *rwork) 
Compute a randomized QR factorization. More...  
int  core_srqrrt (float tol, pastix_int_t maxrank, pastix_int_t nb, pastix_int_t m, pastix_int_t n, float *A, pastix_int_t lda, float *tau, float *B, pastix_int_t ldb, float *tau_b, float *work, pastix_int_t lwork, float normA) 
Compute a randomized QR factorization with rotation technique. More...  
int  core_stqrcp (float tol, pastix_int_t maxrank, int unused, pastix_int_t nb, pastix_int_t m, pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *jpvt, float *tau, float *work, pastix_int_t lwork, float *rwork) 
Compute a randomized QR factorization with truncated updates. More...  
int  core_stradd (pastix_uplo_t uplo, pastix_trans_t trans, pastix_int_t M, pastix_int_t N, float alpha, const float *A, pastix_int_t LDA, float beta, float *B, pastix_int_t LDB) 
Add two triangular matrices together as in PBLAS pstradd. More...  
int  core_sscalo (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, const float *A, pastix_int_t lda, const float *D, pastix_int_t ldd, float *B, pastix_int_t ldb) 
Scale a matrix by a diagonal out of place. More...  
PastixFloat Othogonalization kernels for lowrank updates  
pastix_fixdbl_t  core_slrorthu_fullqr (pastix_int_t M, pastix_int_t N, pastix_int_t rank, float *U, pastix_int_t ldu, float *V, pastix_int_t ldv) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR. More...  
pastix_fixdbl_t  core_slrorthu_partialqr (pastix_int_t M, pastix_int_t N, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, float *U, pastix_int_t ldu, float *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR. More...  
pastix_fixdbl_t  core_slrorthu_cgs (pastix_int_t M1, pastix_int_t N1, pastix_int_t M2, pastix_int_t N2, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, float *U, pastix_int_t ldu, float *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS. More...  
PastixFloat LAPACK kernels  
void  core_spotrfsp (pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t . More...  
void  core_sgetrfsp (pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U. More...  
void  core_ssytrfsp (pastix_int_t n, float *A, pastix_int_t lda, pastix_int_t *nbpivots, float criterion) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t. More...  
PastixComplex64 BLAS kernels  
void  core_zplrnt (int m, int n, pastix_complex64_t *A, int lda, int gM, int m0, int n0, unsigned long long int seed) 
Generate a random tile. More...  
void  core_zgetmo (int m, int n, const pastix_complex64_t *A, int lda, pastix_complex64_t *B, int ldb) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn. More...  
int  core_zgeadd (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, pastix_complex64_t alpha, const pastix_complex64_t *A, pastix_int_t LDA, pastix_complex64_t beta, pastix_complex64_t *B, pastix_int_t LDB) 
Add two matrices together. More...  
int  core_zgemdm (pastix_trans_t transA, pastix_trans_t transB, int M, int N, int K, pastix_complex64_t alpha, const pastix_complex64_t *A, int LDA, const pastix_complex64_t *B, int LDB, pastix_complex64_t beta, pastix_complex64_t *C, int LDC, const pastix_complex64_t *D, int incD, pastix_complex64_t *WORK, int LWORK) 
Perform one of the following matrixmatrix operations. More...  
int  core_zpqrcp (double tol, pastix_int_t maxrank, int full_update, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex64_t *tau, pastix_complex64_t *work, pastix_int_t lwork, double *rwork) 
Compute a rankreavealing QR factorization. More...  
int  core_zrqrcp (double tol, pastix_int_t maxrank, int refine, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex64_t *tau, pastix_complex64_t *work, pastix_int_t lwork, double *rwork) 
Compute a randomized QR factorization. More...  
int  core_zrqrrt (double tol, pastix_int_t maxrank, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_complex64_t *tau, pastix_complex64_t *B, pastix_int_t ldb, pastix_complex64_t *tau_b, pastix_complex64_t *work, pastix_int_t lwork, double normA) 
Compute a randomized QR factorization with rotation technique. More...  
int  core_ztqrcp (double tol, pastix_int_t maxrank, int unused, pastix_int_t nb, pastix_int_t m, pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *jpvt, pastix_complex64_t *tau, pastix_complex64_t *work, pastix_int_t lwork, double *rwork) 
Compute a randomized QR factorization with truncated updates. More...  
int  core_ztradd (pastix_uplo_t uplo, pastix_trans_t trans, pastix_int_t M, pastix_int_t N, pastix_complex64_t alpha, const pastix_complex64_t *A, pastix_int_t LDA, pastix_complex64_t beta, pastix_complex64_t *B, pastix_int_t LDB) 
Add two triangular matrices together as in PBLAS pztradd. More...  
int  core_zscalo (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, const pastix_complex64_t *A, pastix_int_t lda, const pastix_complex64_t *D, pastix_int_t ldd, pastix_complex64_t *B, pastix_int_t ldb) 
Scale a matrix by a diagonal out of place. More...  
PastixComplex64 Othogonalization kernels for lowrank updates  
pastix_fixdbl_t  core_zlrorthu_fullqr (pastix_int_t M, pastix_int_t N, pastix_int_t rank, pastix_complex64_t *U, pastix_int_t ldu, pastix_complex64_t *V, pastix_int_t ldv) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR. More...  
pastix_fixdbl_t  core_zlrorthu_partialqr (pastix_int_t M, pastix_int_t N, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, pastix_complex64_t *U, pastix_int_t ldu, pastix_complex64_t *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR. More...  
pastix_fixdbl_t  core_zlrorthu_cgs (pastix_int_t M1, pastix_int_t N1, pastix_int_t M2, pastix_int_t N2, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, pastix_complex64_t *U, pastix_int_t ldu, pastix_complex64_t *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS. More...  
PastixComplex64 LAPACK kernels  
void  core_zpotrfsp (pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t . More...  
void  core_zpxtrfsp (pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting LL^t factorization of the matrix nbyn A = L * L^t . More...  
void  core_zgetrfsp (pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U. More...  
void  core_zsytrfsp (pastix_int_t n, pastix_complex64_t *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t. More...  
PastixDouble BLAS kernels  
void  core_dplrnt (int m, int n, double *A, int lda, int gM, int m0, int n0, unsigned long long int seed) 
Generate a random tile. More...  
void  core_dgetmo (int m, int n, const double *A, int lda, double *B, int ldb) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn. More...  
int  core_dgeadd (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, double alpha, const double *A, pastix_int_t LDA, double beta, double *B, pastix_int_t LDB) 
Add two matrices together. More...  
int  core_dgemdm (pastix_trans_t transA, pastix_trans_t transB, int M, int N, int K, double alpha, const double *A, int LDA, const double *B, int LDB, double beta, double *C, int LDC, const double *D, int incD, double *WORK, int LWORK) 
Perform one of the following matrixmatrix operations. More...  
int  core_dpqrcp (double tol, pastix_int_t maxrank, int full_update, pastix_int_t nb, pastix_int_t m, pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *jpvt, double *tau, double *work, pastix_int_t lwork, double *rwork) 
Compute a rankreavealing QR factorization. More...  
int  core_drqrcp (double tol, pastix_int_t maxrank, int refine, pastix_int_t nb, pastix_int_t m, pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *jpvt, double *tau, double *work, pastix_int_t lwork, double *rwork) 
Compute a randomized QR factorization. More...  
int  core_drqrrt (double tol, pastix_int_t maxrank, pastix_int_t nb, pastix_int_t m, pastix_int_t n, double *A, pastix_int_t lda, double *tau, double *B, pastix_int_t ldb, double *tau_b, double *work, pastix_int_t lwork, double normA) 
Compute a randomized QR factorization with rotation technique. More...  
int  core_dtqrcp (double tol, pastix_int_t maxrank, int unused, pastix_int_t nb, pastix_int_t m, pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *jpvt, double *tau, double *work, pastix_int_t lwork, double *rwork) 
Compute a randomized QR factorization with truncated updates. More...  
int  core_dtradd (pastix_uplo_t uplo, pastix_trans_t trans, pastix_int_t M, pastix_int_t N, double alpha, const double *A, pastix_int_t LDA, double beta, double *B, pastix_int_t LDB) 
Add two triangular matrices together as in PBLAS pdtradd. More...  
int  core_dscalo (pastix_trans_t trans, pastix_int_t M, pastix_int_t N, const double *A, pastix_int_t lda, const double *D, pastix_int_t ldd, double *B, pastix_int_t ldb) 
Scale a matrix by a diagonal out of place. More...  
PastixDouble Othogonalization kernels for lowrank updates  
pastix_fixdbl_t  core_dlrorthu_fullqr (pastix_int_t M, pastix_int_t N, pastix_int_t rank, double *U, pastix_int_t ldu, double *V, pastix_int_t ldv) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR. More...  
pastix_fixdbl_t  core_dlrorthu_partialqr (pastix_int_t M, pastix_int_t N, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, double *U, pastix_int_t ldu, double *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR. More...  
pastix_fixdbl_t  core_dlrorthu_cgs (pastix_int_t M1, pastix_int_t N1, pastix_int_t M2, pastix_int_t N2, pastix_int_t r1, pastix_int_t *r2ptr, pastix_int_t offx, pastix_int_t offy, double *U, pastix_int_t ldu, double *V, pastix_int_t ldv) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS. More...  
PastixDouble LAPACK kernels  
void  core_dpotrfsp (pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t . More...  
void  core_dgetrfsp (pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U. More...  
void  core_dsytrfsp (pastix_int_t n, double *A, pastix_int_t lda, pastix_int_t *nbpivots, double criterion) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t. More...  
This module contains all the BLAS and LAPACKlike kernels that are working on lapack layout matrices.
void core_cplrnt  (  int  m, 
int  n,  
pastix_complex32_t *  A,  
int  lda,  
int  gM,  
int  m0,  
int  n0,  
unsigned long long int  seed  
) 
Generate a random tile.
[in]  m  The number of rows of the tile A. m >= 0. 
[in]  n  The number of columns of the tile A. n >= 0. 
[in,out]  A  On entry, the mbyn tile to be initialized. On exit, the tile initialized in the mtxtype format. 
[in]  lda  The leading dimension of the tile A. lda >= max(1,m). 
[in]  gM  The global number of rows of the full matrix, A is belonging to. gM >= (m0+M). 
[in]  m0  The index of the first row of tile A in the full matrix. m0 >= 0. 
[in]  n0  The index of the first column of tile A in the full matrix. n0 >= 0. 
[in]  seed  The seed used for random generation. Must be the same for all tiles initialized with this routine. 
Definition at line 91 of file core_cplrnt.c.
void core_cgetmo  (  int  m, 
int  n,  
const pastix_complex32_t *  A,  
int  lda,  
pastix_complex32_t *  B,  
int  ldb  
) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn.
[in]  m  Number of rows of A. 
[in]  n  Number of columns of A. 
[in]  A  Matrix to be transposed. 
[in]  lda  Leading dimension of matrix A. 
[in,out]  B  On exit B = trans(A). 
[in]  ldb  Leading dimension of matrix B. 
Definition at line 49 of file core_cgetmo.c.
Referenced by core_clr2ge().
int core_cgeadd  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
pastix_complex32_t  alpha,  
const pastix_complex32_t *  A,  
pastix_int_t  LDA,  
pastix_complex32_t  beta,  
pastix_complex32_t *  B,  
pastix_int_t  LDB  
) 
Add two matrices together.
Perform the operation: B < alpha * op(A) + B
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A, if trans == PastixNoTrans, number of columns of A otherwise. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A, if trans == PastixNoTrans, number of rows of A otherwise. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans == PastixNoTrans, LDAbyM, otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,K). K = M if trans == PastixNoTrans, K = N otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 78 of file core_cgeadd.c.
References PastixConjTrans, and PastixNoTrans.
Referenced by core_clrconcatenate_v(), core_clrcpy(), core_ctradd(), cpublok_cadd_frfr(), cpucblk_cadd_frfr(), cpucblk_cdiff(), cpucblk_cgeaddsp1d(), cpucblk_cgetrfsp1d_getrf(), cpucblk_cgetschur_fr(), and cpucblk_cgetschur_lr().
int core_cgemdm  (  pastix_trans_t  transA, 
pastix_trans_t  transB,  
int  M,  
int  N,  
int  K,  
pastix_complex32_t  alpha,  
const pastix_complex32_t *  A,  
int  LDA,  
const pastix_complex32_t *  B,  
int  LDB,  
pastix_complex32_t  beta,  
pastix_complex32_t *  C,  
int  LDC,  
const pastix_complex32_t *  D,  
int  incD,  
pastix_complex32_t *  WORK,  
int  LWORK  
) 
Perform one of the following matrixmatrix operations.
C := alpha*op( A )*D*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B, C and D are matrices, with
op( A ) an m by k matrix, op( B ) an k by n matrix, C an m by n matrix and D an k by k matrix.
[in]  transA 

[in]  transB 

[in]  M  The number of rows of the matrix op( A ) and of the matrix C. M must be at least zero. 
[in]  N  The number of columns of the matrix op( B ) and the number of columns of the matrix C. N must be at least zero. 
[in]  K  The number of columns of the matrix op( A ), the number of rows of the matrix op( B ), and the number of rows and columns of matrix D. K must be at least zero. 
[in]  alpha  On entry, alpha specifies the scalar alpha. 
[in]  A  pastix_complex32_t array of DIMENSION ( LDA, ka ), where ka is k when transA = PastixTrans, and is m otherwise. Before entry with transA = PastixTrans, the leading m by k part of the array A must contain the matrix A, otherwise the leading k by m part of the array A must contain the matrix A. 
[in]  LDA  On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When transA = PastixTrans then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ). 
[in]  B  pastix_complex32_t array of DIMENSION ( LDB, kb ), where kb is n when transB = PastixTrans, and is k otherwise. Before entry with transB = PastixTrans, the leading k by n part of the array B must contain the matrix B, otherwise the leading n by k part of the array B must contain the matrix B. 
[in]  LDB  On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When transB = PastixTrans then LDB must be at least max( 1, k ), otherwise LDB must be at least max( 1, n ). 
[in]  beta  On entry, beta specifies the scalar beta. When beta is supplied as zero then C need not be set on input. 
[in]  C  pastix_complex32_t array of DIMENSION ( LDC, n ). Before entry, the leading m by n part of the array C must contain the matrix C, except when beta is zero, in which case C need not be set on entry. On exit, the array C is overwritten by the m by n matrix ( alpha*op( A )*D*op( B ) + beta*C ). 
[in]  LDC  On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, m ). 
[in]  D  pastix_complex32_t array of DIMENSION ( LDD, k ). Before entry, the leading k by k part of the array D must contain the matrix D. 
[in]  incD  On entry, LDD specifies the first dimension of D as declared in the calling (sub) program. LDD must be at least max( 1, k ). 
[in]  WORK  pastix_complex32_t array, dimension (MAX(1,LWORK)) 
[in]  LWORK  The length of WORK. On entry, if transA = PastixTrans and transB = PastixTrans then LWORK >= max(1, K*N). Otherwise LWORK >= max(1, M*K). 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
Definition at line 139 of file core_cgemdm.c.
References PASTIX_SUCCESS, PastixConjTrans, and PastixNoTrans.
Referenced by core_chetrfsp1d_gemm(), and core_csytrfsp1d_gemm().
int core_cpqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  full_update,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex32_t *  tau,  
pastix_complex32_t *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a rankreavealing QR factorization.
This routine is originated from the LAPACK kernels cgeqp3/claqps and was modified by A. Buttari for MUMPSBLR. In this version the stopping criterion is based on the frobenius norm of the residual, and not on the estimate of the twonorm making it more restrictive. Thus, the returned ranks are larger, but this gives a better accuracy.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  full_update  If true, all the trailing submatrix is updated, even if maxrank is reached. If false, the trailing submatrix is not updated as soon as it is not worth it. (Unused for now but kept to match API of RQRCP and TQRCP) 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 105 of file core_cpqrcp.c.
References pastix_int_t.
Referenced by core_cge2lr_pqrcp(), core_crqrcp(), core_crradd_pqrcp(), and core_ctqrcp().
int core_crqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex32_t *  tau,  
pastix_complex32_t *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a randomized QR factorization.
This kernel implements the algorithm described in: Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Lowrank Matrix Approximations. Jianwei Xiao, Ming Gu, and Julien Langou
The main difference in this algorithm relies in two points: 1) First, we stop the iterative porcess based on a tolerance criterion forwarded to the QR with column pivoting kernel, while they have a fixed number of iterations defined by parameter. 2) Second, we perform an extra PQRCP call on the trailing submatrix to finalize the computations, while in the paper above they use a spectrum revealing algorithm to refine the solution.
More information can also be found in Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. N. Halko, P. G. Martinsson, and J. A. Tropp
Based on this paper, we set the p parameter to 5, as it seems sufficient, and because we iterate multiple times to get the final rank.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  Enable/disable the extra refinement step that performs an additional PQRCP on the trailing submatrix. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 114 of file core_crqrcp.c.
References core_cpqrcp(), and pastix_int_t.
Referenced by core_cge2lr_rqrcp(), and core_crradd_rqrcp().
int core_crqrrt  (  float  tol, 
pastix_int_t  maxrank,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_complex32_t *  tau,  
pastix_complex32_t *  B,  
pastix_int_t  ldb,  
pastix_complex32_t *  tau_b,  
pastix_complex32_t *  work,  
pastix_int_t  lwork,  
float  normA  
) 
Compute a randomized QR factorization with rotation technique.
This kernel implements the second method (he did not write a pseudocode for the second method) described in :
Blocked rankrevealing QR factorizations: How randomized sampling can be used to avoid singlevector pivoting. P. G. Martinsson
Note that we only use the trailing matrix for resampling. We don't use power of it for getting better results, since it would be too costly.
Difference from randomized QRCP is : 1) resampling is used instead of sample update formula 2) Instead of pivoting A, rotation is applied on it 3) Instead of working with B and omega, directly their transposes are handled for simplicity
The main difference in this algorithm compared to figure 5 in the Martinsson's paper: 1) First, we stop the iterative process based on a tolerance criterion 2) Second, we do not apply SVD for gathering the mass on the diagonal blocks, so our method is less costly but we expect it to be less close to SVD result 3) Third, we do not apply the power iteration for resampling for a closer result to SVD, since it is too costly
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in,out]  A  The matrix of dimension ldabyn that needs to be compressed. On output, the A matrix is computed up to the founded rank k (k first columns and rows). Everything else, should be dropped. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix A. 
[out]  B  The matrix of dimension ldbbymaxrank that will holds the partial projection of the matrix A. On output, each block of 32 columns can be used to computed the reverse rotation of the R part of A. 
[in]  ldb  The leading dimension of the matrix B. ldb >= max(1, n). 
[out]  tau_b  Contains scalar factors of the elementary reflectors for the matrix B. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * max(n,n)) If lwork == 1, the function returns immediately and work[0] contains the optimal size of work. 
[in]  normA  The norm of the input matrixA. If negative, the norm is computed by the kernel. 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 126 of file core_crqrrt.c.
References pastix_int_t.
Referenced by core_cge2lr_rqrrt().
int core_ctqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex32_t *  tau,  
pastix_complex32_t *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a randomized QR factorization with truncated updates.
This routine is derivated from "Randomized QR with Column Pivoting", J. A. Duersch and M. Gu, SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. C263C291, 2017.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  TODO 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 99 of file core_ctqrcp.c.
References core_cpqrcp(), and pastix_int_t.
Referenced by core_cge2lr_tqrcp(), and core_crradd_tqrcp().
int core_ctradd  (  pastix_uplo_t  uplo, 
pastix_trans_t  trans,  
pastix_int_t  M,  
pastix_int_t  N,  
pastix_complex32_t  alpha,  
const pastix_complex32_t *  A,  
pastix_int_t  LDA,  
pastix_complex32_t  beta,  
pastix_complex32_t *  B,  
pastix_int_t  LDB  
) 
Add two triangular matrices together as in PBLAS pctradd.
B < alpha * op(A) + beta * B,
where op(X) = X, X', or conjf(X')
[in]  uplo  Specifies the shape of A and B matrices:

[in]  trans  Specifies whether the matrix A is nontransposed, transposed, or conjugate transposed

[in]  M  Number of rows of the matrices op(A) and B. 
[in]  N  Number of columns of the matrices op(A) and B. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans = PastixNoTrans, LDAbyM otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,M) if trans = PastixNoTrans, LDA >= max(1,N) otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. On exit, B = alpha * op(A) + beta * B 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
PastixLower
PastixUpper
Definition at line 79 of file core_ctradd.c.
References core_cgeadd(), PASTIX_SUCCESS, PastixConjTrans, PastixLower, PastixNoTrans, PastixUpper, and PastixUpperLower.
int core_cscalo  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
const pastix_complex32_t *  A,  
pastix_int_t  lda,  
const pastix_complex32_t *  D,  
pastix_int_t  ldd,  
pastix_complex32_t *  B,  
pastix_int_t  ldb  
) 
Scale a matrix by a diagonal out of place.
Perform the operation: B < op(A) * D, where A is a general matrix, and D a diagonal matrix.
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A. 
[in]  A  Matrix of size ldabyN. 
[in]  lda  Leading dimension of the array A. lda >= max(1,M). 
[in]  D  Diagonal matrix of size lddbyN. 
[in]  ldd  Leading dimension of the array D. ldd >= 1. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  ldb  Leading dimension of the array B. ldb >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 74 of file core_cscalo.c.
References pastix_int_t, PastixConjTrans, and PastixNoTrans.
Referenced by cpublok_cscalo(), and cpucblk_cscalo().
pastix_fixdbl_t core_clrorthu_fullqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  rank,  
pastix_complex32_t *  U,  
pastix_int_t  ldu,  
pastix_complex32_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
U = Q R, then U' = Q, and V' = R * V
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  rank  The number of columns of the U matrix, and the number of rows of the V part in the v1v2 matrix. 
[in,out]  U  The U matrix of size ldu by rank. On exit, Q from U = Q R. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, R * V, with R from U = Q R. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 83 of file core_clrothu.c.
References pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_clrorthu_partialqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
pastix_complex32_t *  U,  
pastix_int_t  ldu,  
pastix_complex32_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 197 of file core_clrothu.c.
References core_clrdbg_check_orthogonality_AB(), pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_clrorthu_cgs  (  pastix_int_t  M1, 
pastix_int_t  N1,  
pastix_int_t  M2,  
pastix_int_t  N2,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
pastix_complex32_t *  U,  
pastix_int_t  ldu,  
pastix_complex32_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M1  The number of rows of the U matrix. 
[in]  N1  The number of columns of the U matrix. 
[in]  M2  The number of rows of the u2 part of the U matrix. 
[in]  N2  The number of columns of the v2 part of the V matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 430 of file core_clrothu.c.
References core_clrdbg_check_orthogonality_AB(), and pastix_int_t.
void core_cpotrfsp  (  pastix_int_t  n, 
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with Cholesky factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 144 of file core_cpotrfsp.c.
References core_cpotf2sp(), and pastix_int_t.
void core_cpxtrfsp  (  pastix_int_t  n, 
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting LL^t factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 136 of file core_cpxtrfsp.c.
References core_cpxtf2sp(), and pastix_int_t.
void core_cgetrfsp  (  pastix_int_t  n, 
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LU factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the number of pivots is incremented. 
Definition at line 138 of file core_cgetrfsp.c.
References core_cgetf2sp(), and pastix_int_t.
void core_csytrfsp  (  pastix_int_t  n, 
pastix_complex32_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LDL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 143 of file core_csytrfsp.c.
References core_csytf2sp(), and pastix_int_t.
void core_splrnt  (  int  m, 
int  n,  
float *  A,  
int  lda,  
int  gM,  
int  m0,  
int  n0,  
unsigned long long int  seed  
) 
Generate a random tile.
[in]  m  The number of rows of the tile A. m >= 0. 
[in]  n  The number of columns of the tile A. n >= 0. 
[in,out]  A  On entry, the mbyn tile to be initialized. On exit, the tile initialized in the mtxtype format. 
[in]  lda  The leading dimension of the tile A. lda >= max(1,m). 
[in]  gM  The global number of rows of the full matrix, A is belonging to. gM >= (m0+M). 
[in]  m0  The index of the first row of tile A in the full matrix. m0 >= 0. 
[in]  n0  The index of the first column of tile A in the full matrix. n0 >= 0. 
[in]  seed  The seed used for random generation. Must be the same for all tiles initialized with this routine. 
Definition at line 91 of file core_splrnt.c.
void core_sgetmo  (  int  m, 
int  n,  
const float *  A,  
int  lda,  
float *  B,  
int  ldb  
) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn.
[in]  m  Number of rows of A. 
[in]  n  Number of columns of A. 
[in]  A  Matrix to be transposed. 
[in]  lda  Leading dimension of matrix A. 
[in,out]  B  On exit B = trans(A). 
[in]  ldb  Leading dimension of matrix B. 
Definition at line 49 of file core_sgetmo.c.
Referenced by core_slr2ge().
int core_sgeadd  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
float  alpha,  
const float *  A,  
pastix_int_t  LDA,  
float  beta,  
float *  B,  
pastix_int_t  LDB  
) 
Add two matrices together.
Perform the operation: B < alpha * op(A) + B
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A, if trans == PastixNoTrans, number of columns of A otherwise. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A, if trans == PastixNoTrans, number of rows of A otherwise. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans == PastixNoTrans, LDAbyM, otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,K). K = M if trans == PastixNoTrans, K = N otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 78 of file core_sgeadd.c.
References PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by core_slrconcatenate_v(), core_slrcpy(), core_stradd(), cpublok_sadd_frfr(), cpucblk_sadd_frfr(), cpucblk_sdiff(), cpucblk_sgeaddsp1d(), cpucblk_sgetrfsp1d_getrf(), cpucblk_sgetschur_fr(), and cpucblk_sgetschur_lr().
int core_sgemdm  (  pastix_trans_t  transA, 
pastix_trans_t  transB,  
int  M,  
int  N,  
int  K,  
float  alpha,  
const float *  A,  
int  LDA,  
const float *  B,  
int  LDB,  
float  beta,  
float *  C,  
int  LDC,  
const float *  D,  
int  incD,  
float *  WORK,  
int  LWORK  
) 
Perform one of the following matrixmatrix operations.
C := alpha*op( A )*D*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B, C and D are matrices, with
op( A ) an m by k matrix, op( B ) an k by n matrix, C an m by n matrix and D an k by k matrix.
[in]  transA 

[in]  transB 

[in]  M  The number of rows of the matrix op( A ) and of the matrix C. M must be at least zero. 
[in]  N  The number of columns of the matrix op( B ) and the number of columns of the matrix C. N must be at least zero. 
[in]  K  The number of columns of the matrix op( A ), the number of rows of the matrix op( B ), and the number of rows and columns of matrix D. K must be at least zero. 
[in]  alpha  On entry, alpha specifies the scalar alpha. 
[in]  A  float array of DIMENSION ( LDA, ka ), where ka is k when transA = PastixTrans, and is m otherwise. Before entry with transA = PastixTrans, the leading m by k part of the array A must contain the matrix A, otherwise the leading k by m part of the array A must contain the matrix A. 
[in]  LDA  On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When transA = PastixTrans then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ). 
[in]  B  float array of DIMENSION ( LDB, kb ), where kb is n when transB = PastixTrans, and is k otherwise. Before entry with transB = PastixTrans, the leading k by n part of the array B must contain the matrix B, otherwise the leading n by k part of the array B must contain the matrix B. 
[in]  LDB  On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When transB = PastixTrans then LDB must be at least max( 1, k ), otherwise LDB must be at least max( 1, n ). 
[in]  beta  On entry, beta specifies the scalar beta. When beta is supplied as zero then C need not be set on input. 
[in]  C  float array of DIMENSION ( LDC, n ). Before entry, the leading m by n part of the array C must contain the matrix C, except when beta is zero, in which case C need not be set on entry. On exit, the array C is overwritten by the m by n matrix ( alpha*op( A )*D*op( B ) + beta*C ). 
[in]  LDC  On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, m ). 
[in]  D  float array of DIMENSION ( LDD, k ). Before entry, the leading k by k part of the array D must contain the matrix D. 
[in]  incD  On entry, LDD specifies the first dimension of D as declared in the calling (sub) program. LDD must be at least max( 1, k ). 
[in]  WORK  float array, dimension (MAX(1,LWORK)) 
[in]  LWORK  The length of WORK. On entry, if transA = PastixTrans and transB = PastixTrans then LWORK >= max(1, K*N). Otherwise LWORK >= max(1, M*K). 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
Definition at line 139 of file core_sgemdm.c.
References PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by core_ssytrfsp1d_gemm().
int core_spqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  full_update,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
float *  tau,  
float *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a rankreavealing QR factorization.
This routine is originated from the LAPACK kernels sgeqp3/slaqps and was modified by A. Buttari for MUMPSBLR. In this version the stopping criterion is based on the frobenius norm of the residual, and not on the estimate of the twonorm making it more restrictive. Thus, the returned ranks are larger, but this gives a better accuracy.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  full_update  If true, all the trailing submatrix is updated, even if maxrank is reached. If false, the trailing submatrix is not updated as soon as it is not worth it. (Unused for now but kept to match API of RQRCP and TQRCP) 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 105 of file core_spqrcp.c.
References pastix_int_t.
Referenced by core_sge2lr_pqrcp(), core_srqrcp(), core_srradd_pqrcp(), and core_stqrcp().
int core_srqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
float *  tau,  
float *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a randomized QR factorization.
This kernel implements the algorithm described in: Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Lowrank Matrix Approximations. Jianwei Xiao, Ming Gu, and Julien Langou
The main difference in this algorithm relies in two points: 1) First, we stop the iterative porcess based on a tolerance criterion forwarded to the QR with column pivoting kernel, while they have a fixed number of iterations defined by parameter. 2) Second, we perform an extra PQRCP call on the trailing submatrix to finalize the computations, while in the paper above they use a spectrum revealing algorithm to refine the solution.
More information can also be found in Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. N. Halko, P. G. Martinsson, and J. A. Tropp
Based on this paper, we set the p parameter to 5, as it seems sufficient, and because we iterate multiple times to get the final rank.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  Enable/disable the extra refinement step that performs an additional PQRCP on the trailing submatrix. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 114 of file core_srqrcp.c.
References core_spqrcp(), and pastix_int_t.
Referenced by core_sge2lr_rqrcp(), and core_srradd_rqrcp().
int core_srqrrt  (  float  tol, 
pastix_int_t  maxrank,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
float *  A,  
pastix_int_t  lda,  
float *  tau,  
float *  B,  
pastix_int_t  ldb,  
float *  tau_b,  
float *  work,  
pastix_int_t  lwork,  
float  normA  
) 
Compute a randomized QR factorization with rotation technique.
This kernel implements the second method (he did not write a pseudocode for the second method) described in :
Blocked rankrevealing QR factorizations: How randomized sampling can be used to avoid singlevector pivoting. P. G. Martinsson
Note that we only use the trailing matrix for resampling. We don't use power of it for getting better results, since it would be too costly.
Difference from randomized QRCP is : 1) resampling is used instead of sample update formula 2) Instead of pivoting A, rotation is applied on it 3) Instead of working with B and omega, directly their transposes are handled for simplicity
The main difference in this algorithm compared to figure 5 in the Martinsson's paper: 1) First, we stop the iterative process based on a tolerance criterion 2) Second, we do not apply SVD for gathering the mass on the diagonal blocks, so our method is less costly but we expect it to be less close to SVD result 3) Third, we do not apply the power iteration for resampling for a closer result to SVD, since it is too costly
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in,out]  A  The matrix of dimension ldabyn that needs to be compressed. On output, the A matrix is computed up to the founded rank k (k first columns and rows). Everything else, should be dropped. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix A. 
[out]  B  The matrix of dimension ldbbymaxrank that will holds the partial projection of the matrix A. On output, each block of 32 columns can be used to computed the reverse rotation of the R part of A. 
[in]  ldb  The leading dimension of the matrix B. ldb >= max(1, n). 
[out]  tau_b  Contains scalar factors of the elementary reflectors for the matrix B. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * max(n,n)) If lwork == 1, the function returns immediately and work[0] contains the optimal size of work. 
[in]  normA  The norm of the input matrixA. If negative, the norm is computed by the kernel. 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 126 of file core_srqrrt.c.
References pastix_int_t.
Referenced by core_sge2lr_rqrrt().
int core_stqrcp  (  float  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
float *  tau,  
float *  work,  
pastix_int_t  lwork,  
float *  rwork  
) 
Compute a randomized QR factorization with truncated updates.
This routine is derivated from "Randomized QR with Column Pivoting", J. A. Duersch and M. Gu, SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. C263C291, 2017.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  TODO 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 99 of file core_stqrcp.c.
References core_spqrcp(), and pastix_int_t.
Referenced by core_sge2lr_tqrcp(), and core_srradd_tqrcp().
int core_stradd  (  pastix_uplo_t  uplo, 
pastix_trans_t  trans,  
pastix_int_t  M,  
pastix_int_t  N,  
float  alpha,  
const float *  A,  
pastix_int_t  LDA,  
float  beta,  
float *  B,  
pastix_int_t  LDB  
) 
Add two triangular matrices together as in PBLAS pstradd.
B < alpha * op(A) + beta * B,
where op(X) = X, X', or (X')
[in]  uplo  Specifies the shape of A and B matrices:

[in]  trans  Specifies whether the matrix A is nontransposed, transposed, or conjugate transposed

[in]  M  Number of rows of the matrices op(A) and B. 
[in]  N  Number of columns of the matrices op(A) and B. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans = PastixNoTrans, LDAbyM otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,M) if trans = PastixNoTrans, LDA >= max(1,N) otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. On exit, B = alpha * op(A) + beta * B 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
PastixLower
PastixUpper
Definition at line 79 of file core_stradd.c.
References core_sgeadd(), PASTIX_SUCCESS, PastixLower, PastixNoTrans, PastixTrans, PastixUpper, and PastixUpperLower.
int core_sscalo  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
const float *  A,  
pastix_int_t  lda,  
const float *  D,  
pastix_int_t  ldd,  
float *  B,  
pastix_int_t  ldb  
) 
Scale a matrix by a diagonal out of place.
Perform the operation: B < op(A) * D, where A is a general matrix, and D a diagonal matrix.
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A. 
[in]  A  Matrix of size ldabyN. 
[in]  lda  Leading dimension of the array A. lda >= max(1,M). 
[in]  D  Diagonal matrix of size lddbyN. 
[in]  ldd  Leading dimension of the array D. ldd >= 1. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  ldb  Leading dimension of the array B. ldb >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 74 of file core_sscalo.c.
References pastix_int_t, PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by cpublok_sscalo(), and cpucblk_sscalo().
pastix_fixdbl_t core_slrorthu_fullqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  rank,  
float *  U,  
pastix_int_t  ldu,  
float *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
U = Q R, then U' = Q, and V' = R * V
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  rank  The number of columns of the U matrix, and the number of rows of the V part in the v1v2 matrix. 
[in,out]  U  The U matrix of size ldu by rank. On exit, Q from U = Q R. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, R * V, with R from U = Q R. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 83 of file core_slrothu.c.
References pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_slrorthu_partialqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
float *  U,  
pastix_int_t  ldu,  
float *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 197 of file core_slrothu.c.
References core_slrdbg_check_orthogonality_AB(), pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_slrorthu_cgs  (  pastix_int_t  M1, 
pastix_int_t  N1,  
pastix_int_t  M2,  
pastix_int_t  N2,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
float *  U,  
pastix_int_t  ldu,  
float *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M1  The number of rows of the U matrix. 
[in]  N1  The number of columns of the U matrix. 
[in]  M2  The number of rows of the u2 part of the U matrix. 
[in]  N2  The number of columns of the v2 part of the V matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 430 of file core_slrothu.c.
References core_slrdbg_check_orthogonality_AB(), and pastix_int_t.
void core_spotrfsp  (  pastix_int_t  n, 
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with Cholesky factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 144 of file core_spotrfsp.c.
References core_spotf2sp(), and pastix_int_t.
void core_sgetrfsp  (  pastix_int_t  n, 
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LU factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the number of pivots is incremented. 
Definition at line 138 of file core_sgetrfsp.c.
References core_sgetf2sp(), and pastix_int_t.
void core_ssytrfsp  (  pastix_int_t  n, 
float *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
float  criterion  
) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LDL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 143 of file core_ssytrfsp.c.
References core_ssytf2sp(), and pastix_int_t.
void core_zplrnt  (  int  m, 
int  n,  
pastix_complex64_t *  A,  
int  lda,  
int  gM,  
int  m0,  
int  n0,  
unsigned long long int  seed  
) 
Generate a random tile.
[in]  m  The number of rows of the tile A. m >= 0. 
[in]  n  The number of columns of the tile A. n >= 0. 
[in,out]  A  On entry, the mbyn tile to be initialized. On exit, the tile initialized in the mtxtype format. 
[in]  lda  The leading dimension of the tile A. lda >= max(1,m). 
[in]  gM  The global number of rows of the full matrix, A is belonging to. gM >= (m0+M). 
[in]  m0  The index of the first row of tile A in the full matrix. m0 >= 0. 
[in]  n0  The index of the first column of tile A in the full matrix. n0 >= 0. 
[in]  seed  The seed used for random generation. Must be the same for all tiles initialized with this routine. 
Definition at line 91 of file core_zplrnt.c.
void core_zgetmo  (  int  m, 
int  n,  
const pastix_complex64_t *  A,  
int  lda,  
pastix_complex64_t *  B,  
int  ldb  
) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn.
[in]  m  Number of rows of A. 
[in]  n  Number of columns of A. 
[in]  A  Matrix to be transposed. 
[in]  lda  Leading dimension of matrix A. 
[in,out]  B  On exit B = trans(A). 
[in]  ldb  Leading dimension of matrix B. 
Definition at line 49 of file core_zgetmo.c.
Referenced by core_zlr2ge().
int core_zgeadd  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
pastix_complex64_t  alpha,  
const pastix_complex64_t *  A,  
pastix_int_t  LDA,  
pastix_complex64_t  beta,  
pastix_complex64_t *  B,  
pastix_int_t  LDB  
) 
Add two matrices together.
Perform the operation: B < alpha * op(A) + B
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A, if trans == PastixNoTrans, number of columns of A otherwise. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A, if trans == PastixNoTrans, number of rows of A otherwise. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans == PastixNoTrans, LDAbyM, otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,K). K = M if trans == PastixNoTrans, K = N otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 78 of file core_zgeadd.c.
References PastixConjTrans, and PastixNoTrans.
Referenced by core_zlrconcatenate_v(), core_zlrcpy(), core_ztradd(), cpublok_zadd_frfr(), cpucblk_zadd_frfr(), cpucblk_zdiff(), cpucblk_zgeaddsp1d(), cpucblk_zgetrfsp1d_getrf(), cpucblk_zgetschur_fr(), and cpucblk_zgetschur_lr().
int core_zgemdm  (  pastix_trans_t  transA, 
pastix_trans_t  transB,  
int  M,  
int  N,  
int  K,  
pastix_complex64_t  alpha,  
const pastix_complex64_t *  A,  
int  LDA,  
const pastix_complex64_t *  B,  
int  LDB,  
pastix_complex64_t  beta,  
pastix_complex64_t *  C,  
int  LDC,  
const pastix_complex64_t *  D,  
int  incD,  
pastix_complex64_t *  WORK,  
int  LWORK  
) 
Perform one of the following matrixmatrix operations.
C := alpha*op( A )*D*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B, C and D are matrices, with
op( A ) an m by k matrix, op( B ) an k by n matrix, C an m by n matrix and D an k by k matrix.
[in]  transA 

[in]  transB 

[in]  M  The number of rows of the matrix op( A ) and of the matrix C. M must be at least zero. 
[in]  N  The number of columns of the matrix op( B ) and the number of columns of the matrix C. N must be at least zero. 
[in]  K  The number of columns of the matrix op( A ), the number of rows of the matrix op( B ), and the number of rows and columns of matrix D. K must be at least zero. 
[in]  alpha  On entry, alpha specifies the scalar alpha. 
[in]  A  pastix_complex64_t array of DIMENSION ( LDA, ka ), where ka is k when transA = PastixTrans, and is m otherwise. Before entry with transA = PastixTrans, the leading m by k part of the array A must contain the matrix A, otherwise the leading k by m part of the array A must contain the matrix A. 
[in]  LDA  On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When transA = PastixTrans then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ). 
[in]  B  pastix_complex64_t array of DIMENSION ( LDB, kb ), where kb is n when transB = PastixTrans, and is k otherwise. Before entry with transB = PastixTrans, the leading k by n part of the array B must contain the matrix B, otherwise the leading n by k part of the array B must contain the matrix B. 
[in]  LDB  On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When transB = PastixTrans then LDB must be at least max( 1, k ), otherwise LDB must be at least max( 1, n ). 
[in]  beta  On entry, beta specifies the scalar beta. When beta is supplied as zero then C need not be set on input. 
[in]  C  pastix_complex64_t array of DIMENSION ( LDC, n ). Before entry, the leading m by n part of the array C must contain the matrix C, except when beta is zero, in which case C need not be set on entry. On exit, the array C is overwritten by the m by n matrix ( alpha*op( A )*D*op( B ) + beta*C ). 
[in]  LDC  On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, m ). 
[in]  D  pastix_complex64_t array of DIMENSION ( LDD, k ). Before entry, the leading k by k part of the array D must contain the matrix D. 
[in]  incD  On entry, LDD specifies the first dimension of D as declared in the calling (sub) program. LDD must be at least max( 1, k ). 
[in]  WORK  pastix_complex64_t array, dimension (MAX(1,LWORK)) 
[in]  LWORK  The length of WORK. On entry, if transA = PastixTrans and transB = PastixTrans then LWORK >= max(1, K*N). Otherwise LWORK >= max(1, M*K). 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
Definition at line 139 of file core_zgemdm.c.
References PASTIX_SUCCESS, PastixConjTrans, and PastixNoTrans.
Referenced by core_zhetrfsp1d_gemm(), and core_zsytrfsp1d_gemm().
int core_zpqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  full_update,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex64_t *  tau,  
pastix_complex64_t *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a rankreavealing QR factorization.
This routine is originated from the LAPACK kernels zgeqp3/zlaqps and was modified by A. Buttari for MUMPSBLR. In this version the stopping criterion is based on the frobenius norm of the residual, and not on the estimate of the twonorm making it more restrictive. Thus, the returned ranks are larger, but this gives a better accuracy.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  full_update  If true, all the trailing submatrix is updated, even if maxrank is reached. If false, the trailing submatrix is not updated as soon as it is not worth it. (Unused for now but kept to match API of RQRCP and TQRCP) 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 105 of file core_zpqrcp.c.
References pastix_int_t.
Referenced by core_zge2lr_pqrcp(), and core_zrradd_pqrcp().
int core_zrqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex64_t *  tau,  
pastix_complex64_t *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a randomized QR factorization.
This kernel implements the algorithm described in: Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Lowrank Matrix Approximations. Jianwei Xiao, Ming Gu, and Julien Langou
The main difference in this algorithm relies in two points: 1) First, we stop the iterative porcess based on a tolerance criterion forwarded to the QR with column pivoting kernel, while they have a fixed number of iterations defined by parameter. 2) Second, we perform an extra PQRCP call on the trailing submatrix to finalize the computations, while in the paper above they use a spectrum revealing algorithm to refine the solution.
More information can also be found in Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. N. Halko, P. G. Martinsson, and J. A. Tropp
Based on this paper, we set the p parameter to 5, as it seems sufficient, and because we iterate multiple times to get the final rank.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  Enable/disable the extra refinement step that performs an additional PQRCP on the trailing submatrix. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 114 of file core_zrqrcp.c.
References pastix_int_t.
Referenced by core_zge2lr_rqrcp(), and core_zrradd_rqrcp().
int core_zrqrrt  (  double  tol, 
pastix_int_t  maxrank,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_complex64_t *  tau,  
pastix_complex64_t *  B,  
pastix_int_t  ldb,  
pastix_complex64_t *  tau_b,  
pastix_complex64_t *  work,  
pastix_int_t  lwork,  
double  normA  
) 
Compute a randomized QR factorization with rotation technique.
This kernel implements the second method (he did not write a pseudocode for the second method) described in :
Blocked rankrevealing QR factorizations: How randomized sampling can be used to avoid singlevector pivoting. P. G. Martinsson
Note that we only use the trailing matrix for resampling. We don't use power of it for getting better results, since it would be too costly.
Difference from randomized QRCP is : 1) resampling is used instead of sample update formula 2) Instead of pivoting A, rotation is applied on it 3) Instead of working with B and omega, directly their transposes are handled for simplicity
The main difference in this algorithm compared to figure 5 in the Martinsson's paper: 1) First, we stop the iterative process based on a tolerance criterion 2) Second, we do not apply SVD for gathering the mass on the diagonal blocks, so our method is less costly but we expect it to be less close to SVD result 3) Third, we do not apply the power iteration for resampling for a closer result to SVD, since it is too costly
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in,out]  A  The matrix of dimension ldabyn that needs to be compressed. On output, the A matrix is computed up to the founded rank k (k first columns and rows). Everything else, should be dropped. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix A. 
[out]  B  The matrix of dimension ldbbymaxrank that will holds the partial projection of the matrix A. On output, each block of 32 columns can be used to computed the reverse rotation of the R part of A. 
[in]  ldb  The leading dimension of the matrix B. ldb >= max(1, n). 
[out]  tau_b  Contains scalar factors of the elementary reflectors for the matrix B. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * max(n,n)) If lwork == 1, the function returns immediately and work[0] contains the optimal size of work. 
[in]  normA  The norm of the input matrixA. If negative, the norm is computed by the kernel. 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 126 of file core_zrqrrt.c.
References pastix_int_t.
Referenced by core_zge2lr_rqrrt().
int core_ztqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
pastix_complex64_t *  tau,  
pastix_complex64_t *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a randomized QR factorization with truncated updates.
This routine is derivated from "Randomized QR with Column Pivoting", J. A. Duersch and M. Gu, SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. C263C291, 2017.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  TODO 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 99 of file core_ztqrcp.c.
References pastix_int_t.
Referenced by core_zge2lr_tqrcp(), and core_zrradd_tqrcp().
int core_ztradd  (  pastix_uplo_t  uplo, 
pastix_trans_t  trans,  
pastix_int_t  M,  
pastix_int_t  N,  
pastix_complex64_t  alpha,  
const pastix_complex64_t *  A,  
pastix_int_t  LDA,  
pastix_complex64_t  beta,  
pastix_complex64_t *  B,  
pastix_int_t  LDB  
) 
Add two triangular matrices together as in PBLAS pztradd.
B < alpha * op(A) + beta * B,
where op(X) = X, X', or conj(X')
[in]  uplo  Specifies the shape of A and B matrices:

[in]  trans  Specifies whether the matrix A is nontransposed, transposed, or conjugate transposed

[in]  M  Number of rows of the matrices op(A) and B. 
[in]  N  Number of columns of the matrices op(A) and B. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans = PastixNoTrans, LDAbyM otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,M) if trans = PastixNoTrans, LDA >= max(1,N) otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. On exit, B = alpha * op(A) + beta * B 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
PastixLower
PastixUpper
Definition at line 79 of file core_ztradd.c.
References core_zgeadd(), PASTIX_SUCCESS, PastixConjTrans, PastixLower, PastixNoTrans, PastixUpper, and PastixUpperLower.
int core_zscalo  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
const pastix_complex64_t *  A,  
pastix_int_t  lda,  
const pastix_complex64_t *  D,  
pastix_int_t  ldd,  
pastix_complex64_t *  B,  
pastix_int_t  ldb  
) 
Scale a matrix by a diagonal out of place.
Perform the operation: B < op(A) * D, where A is a general matrix, and D a diagonal matrix.
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A. 
[in]  A  Matrix of size ldabyN. 
[in]  lda  Leading dimension of the array A. lda >= max(1,M). 
[in]  D  Diagonal matrix of size lddbyN. 
[in]  ldd  Leading dimension of the array D. ldd >= 1. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  ldb  Leading dimension of the array B. ldb >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 74 of file core_zscalo.c.
References pastix_int_t, PastixConjTrans, and PastixNoTrans.
Referenced by cpublok_zscalo(), and cpucblk_zscalo().
pastix_fixdbl_t core_zlrorthu_fullqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  rank,  
pastix_complex64_t *  U,  
pastix_int_t  ldu,  
pastix_complex64_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
U = Q R, then U' = Q, and V' = R * V
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  rank  The number of columns of the U matrix, and the number of rows of the V part in the v1v2 matrix. 
[in,out]  U  The U matrix of size ldu by rank. On exit, Q from U = Q R. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, R * V, with R from U = Q R. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 83 of file core_zlrothu.c.
References pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_zlrorthu_partialqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
pastix_complex64_t *  U,  
pastix_int_t  ldu,  
pastix_complex64_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 197 of file core_zlrothu.c.
References core_zlrdbg_check_orthogonality_AB(), pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_zlrorthu_cgs  (  pastix_int_t  M1, 
pastix_int_t  N1,  
pastix_int_t  M2,  
pastix_int_t  N2,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
pastix_complex64_t *  U,  
pastix_int_t  ldu,  
pastix_complex64_t *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M1  The number of rows of the U matrix. 
[in]  N1  The number of columns of the U matrix. 
[in]  M2  The number of rows of the u2 part of the U matrix. 
[in]  N2  The number of columns of the v2 part of the V matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 430 of file core_zlrothu.c.
References core_zlrdbg_check_orthogonality_AB(), and pastix_int_t.
void core_zpotrfsp  (  pastix_int_t  n, 
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with Cholesky factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 144 of file core_zpotrfsp.c.
References core_zpotf2sp(), and pastix_int_t.
void core_zpxtrfsp  (  pastix_int_t  n, 
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting LL^t factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 136 of file core_zpxtrfsp.c.
References core_zpxtf2sp(), and pastix_int_t.
void core_zgetrfsp  (  pastix_int_t  n, 
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LU factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the number of pivots is incremented. 
Definition at line 138 of file core_zgetrfsp.c.
References core_zgetf2sp(), and pastix_int_t.
void core_zsytrfsp  (  pastix_int_t  n, 
pastix_complex64_t *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LDL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 143 of file core_zsytrfsp.c.
References core_zsytf2sp(), and pastix_int_t.
void core_dplrnt  (  int  m, 
int  n,  
double *  A,  
int  lda,  
int  gM,  
int  m0,  
int  n0,  
unsigned long long int  seed  
) 
Generate a random tile.
[in]  m  The number of rows of the tile A. m >= 0. 
[in]  n  The number of columns of the tile A. n >= 0. 
[in,out]  A  On entry, the mbyn tile to be initialized. On exit, the tile initialized in the mtxtype format. 
[in]  lda  The leading dimension of the tile A. lda >= max(1,m). 
[in]  gM  The global number of rows of the full matrix, A is belonging to. gM >= (m0+M). 
[in]  m0  The index of the first row of tile A in the full matrix. m0 >= 0. 
[in]  n0  The index of the first column of tile A in the full matrix. n0 >= 0. 
[in]  seed  The seed used for random generation. Must be the same for all tiles initialized with this routine. 
Definition at line 91 of file core_dplrnt.c.
void core_dgetmo  (  int  m, 
int  n,  
const double *  A,  
int  lda,  
double *  B,  
int  ldb  
) 
Transposes a mbyn matrix out of place using an extra workspace of size mbyn.
[in]  m  Number of rows of A. 
[in]  n  Number of columns of A. 
[in]  A  Matrix to be transposed. 
[in]  lda  Leading dimension of matrix A. 
[in,out]  B  On exit B = trans(A). 
[in]  ldb  Leading dimension of matrix B. 
Definition at line 49 of file core_dgetmo.c.
Referenced by core_dlr2ge().
int core_dgeadd  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
double  alpha,  
const double *  A,  
pastix_int_t  LDA,  
double  beta,  
double *  B,  
pastix_int_t  LDB  
) 
Add two matrices together.
Perform the operation: B < alpha * op(A) + B
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A, if trans == PastixNoTrans, number of columns of A otherwise. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A, if trans == PastixNoTrans, number of rows of A otherwise. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans == PastixNoTrans, LDAbyM, otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,K). K = M if trans == PastixNoTrans, K = N otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 78 of file core_dgeadd.c.
References PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by core_dlrconcatenate_v(), core_dlrcpy(), core_dtradd(), cpublok_dadd_frfr(), cpucblk_dadd_frfr(), cpucblk_ddiff(), cpucblk_dgeaddsp1d(), cpucblk_dgetrfsp1d_getrf(), cpucblk_dgetschur_fr(), and cpucblk_dgetschur_lr().
int core_dgemdm  (  pastix_trans_t  transA, 
pastix_trans_t  transB,  
int  M,  
int  N,  
int  K,  
double  alpha,  
const double *  A,  
int  LDA,  
const double *  B,  
int  LDB,  
double  beta,  
double *  C,  
int  LDC,  
const double *  D,  
int  incD,  
double *  WORK,  
int  LWORK  
) 
Perform one of the following matrixmatrix operations.
C := alpha*op( A )*D*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B, C and D are matrices, with
op( A ) an m by k matrix, op( B ) an k by n matrix, C an m by n matrix and D an k by k matrix.
[in]  transA 

[in]  transB 

[in]  M  The number of rows of the matrix op( A ) and of the matrix C. M must be at least zero. 
[in]  N  The number of columns of the matrix op( B ) and the number of columns of the matrix C. N must be at least zero. 
[in]  K  The number of columns of the matrix op( A ), the number of rows of the matrix op( B ), and the number of rows and columns of matrix D. K must be at least zero. 
[in]  alpha  On entry, alpha specifies the scalar alpha. 
[in]  A  double array of DIMENSION ( LDA, ka ), where ka is k when transA = PastixTrans, and is m otherwise. Before entry with transA = PastixTrans, the leading m by k part of the array A must contain the matrix A, otherwise the leading k by m part of the array A must contain the matrix A. 
[in]  LDA  On entry, LDA specifies the first dimension of A as declared in the calling (sub) program. When transA = PastixTrans then LDA must be at least max( 1, m ), otherwise LDA must be at least max( 1, k ). 
[in]  B  double array of DIMENSION ( LDB, kb ), where kb is n when transB = PastixTrans, and is k otherwise. Before entry with transB = PastixTrans, the leading k by n part of the array B must contain the matrix B, otherwise the leading n by k part of the array B must contain the matrix B. 
[in]  LDB  On entry, LDB specifies the first dimension of B as declared in the calling (sub) program. When transB = PastixTrans then LDB must be at least max( 1, k ), otherwise LDB must be at least max( 1, n ). 
[in]  beta  On entry, beta specifies the scalar beta. When beta is supplied as zero then C need not be set on input. 
[in]  C  double array of DIMENSION ( LDC, n ). Before entry, the leading m by n part of the array C must contain the matrix C, except when beta is zero, in which case C need not be set on entry. On exit, the array C is overwritten by the m by n matrix ( alpha*op( A )*D*op( B ) + beta*C ). 
[in]  LDC  On entry, LDC specifies the first dimension of C as declared in the calling (sub) program. LDC must be at least max( 1, m ). 
[in]  D  double array of DIMENSION ( LDD, k ). Before entry, the leading k by k part of the array D must contain the matrix D. 
[in]  incD  On entry, LDD specifies the first dimension of D as declared in the calling (sub) program. LDD must be at least max( 1, k ). 
[in]  WORK  double array, dimension (MAX(1,LWORK)) 
[in]  LWORK  The length of WORK. On entry, if transA = PastixTrans and transB = PastixTrans then LWORK >= max(1, K*N). Otherwise LWORK >= max(1, M*K). 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
Definition at line 139 of file core_dgemdm.c.
References PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by core_dsytrfsp1d_gemm().
int core_dpqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  full_update,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
double *  tau,  
double *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a rankreavealing QR factorization.
This routine is originated from the LAPACK kernels dgeqp3/dlaqps and was modified by A. Buttari for MUMPSBLR. In this version the stopping criterion is based on the frobenius norm of the residual, and not on the estimate of the twonorm making it more restrictive. Thus, the returned ranks are larger, but this gives a better accuracy.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  full_update  If true, all the trailing submatrix is updated, even if maxrank is reached. If false, the trailing submatrix is not updated as soon as it is not worth it. (Unused for now but kept to match API of RQRCP and TQRCP) 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 105 of file core_dpqrcp.c.
References pastix_int_t.
Referenced by core_dge2lr_pqrcp(), core_drqrcp(), core_drradd_pqrcp(), and core_dtqrcp().
int core_drqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
double *  tau,  
double *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a randomized QR factorization.
This kernel implements the algorithm described in: Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Lowrank Matrix Approximations. Jianwei Xiao, Ming Gu, and Julien Langou
The main difference in this algorithm relies in two points: 1) First, we stop the iterative porcess based on a tolerance criterion forwarded to the QR with column pivoting kernel, while they have a fixed number of iterations defined by parameter. 2) Second, we perform an extra PQRCP call on the trailing submatrix to finalize the computations, while in the paper above they use a spectrum revealing algorithm to refine the solution.
More information can also be found in Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. N. Halko, P. G. Martinsson, and J. A. Tropp
Based on this paper, we set the p parameter to 5, as it seems sufficient, and because we iterate multiple times to get the final rank.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  Enable/disable the extra refinement step that performs an additional PQRCP on the trailing submatrix. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 114 of file core_drqrcp.c.
References core_dpqrcp(), and pastix_int_t.
Referenced by core_dge2lr_rqrcp(), and core_drradd_rqrcp().
int core_drqrrt  (  double  tol, 
pastix_int_t  maxrank,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
double *  A,  
pastix_int_t  lda,  
double *  tau,  
double *  B,  
pastix_int_t  ldb,  
double *  tau_b,  
double *  work,  
pastix_int_t  lwork,  
double  normA  
) 
Compute a randomized QR factorization with rotation technique.
This kernel implements the second method (he did not write a pseudocode for the second method) described in :
Blocked rankrevealing QR factorizations: How randomized sampling can be used to avoid singlevector pivoting. P. G. Martinsson
Note that we only use the trailing matrix for resampling. We don't use power of it for getting better results, since it would be too costly.
Difference from randomized QRCP is : 1) resampling is used instead of sample update formula 2) Instead of pivoting A, rotation is applied on it 3) Instead of working with B and omega, directly their transposes are handled for simplicity
The main difference in this algorithm compared to figure 5 in the Martinsson's paper: 1) First, we stop the iterative process based on a tolerance criterion 2) Second, we do not apply SVD for gathering the mass on the diagonal blocks, so our method is less costly but we expect it to be less close to SVD result 3) Third, we do not apply the power iteration for resampling for a closer result to SVD, since it is too costly
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in,out]  A  The matrix of dimension ldabyn that needs to be compressed. On output, the A matrix is computed up to the founded rank k (k first columns and rows). Everything else, should be dropped. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix A. 
[out]  B  The matrix of dimension ldbbymaxrank that will holds the partial projection of the matrix A. On output, each block of 32 columns can be used to computed the reverse rotation of the R part of A. 
[in]  ldb  The leading dimension of the matrix B. ldb >= max(1, n). 
[out]  tau_b  Contains scalar factors of the elementary reflectors for the matrix B. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * max(n,n)) If lwork == 1, the function returns immediately and work[0] contains the optimal size of work. 
[in]  normA  The norm of the input matrixA. If negative, the norm is computed by the kernel. 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 126 of file core_drqrrt.c.
References pastix_int_t.
Referenced by core_dge2lr_rqrrt().
int core_dtqrcp  (  double  tol, 
pastix_int_t  maxrank,  
int  refine,  
pastix_int_t  nb,  
pastix_int_t  m,  
pastix_int_t  n,  
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  jpvt,  
double *  tau,  
double *  work,  
pastix_int_t  lwork,  
double *  rwork  
) 
Compute a randomized QR factorization with truncated updates.
This routine is derivated from "Randomized QR with Column Pivoting", J. A. Duersch and M. Gu, SIAM Journal on Scientific Computing, vol. 39, no. 4, pp. C263C291, 2017.
[in]  tol  The relative tolerance criterion. Computations are stopped when the frobenius norm of the residual matrix is lower than tol. If tol < 0, then maxrank reflectors are computed. 
[in]  maxrank  Maximum number of reflectors computed. Computations are stopped when the rank exceeds maxrank. If maxrank < 0, all reflectors are computed or up to the tolerance criterion. 
[in]  refine  TODO 
[in]  nb  Tuning parameter for the GEMM blocking size. if nb < 0, nb is set to 32. 
[in]  m  Number of rows of the matrix A. 
[in]  n  Number of columns of the matrix A. 
[in]  A  The matrix of dimension ldabyn that needs to be compressed. 
[in]  lda  The leading dimension of the matrix A. lda >= max(1, m). 
[out]  jpvt  The array that describes the permutation of A. 
[out]  tau  Contains scalar factors of the elementary reflectors for the matrix Q. 
[in]  work  Workspace array of size lwork. 
[in]  lwork  The dimension of the work area. lwork >= (nb * n + max(n, m) ) If lwork == 1, the functions returns immediately and work[0] contains the optimal size of work. 
[in]  rwork  Workspace array used to store partial and exact column norms (2byn) 
If maximum rank is 0, then either the matrix norm is below the tolerance, and we can return a null rank matrix, or it is not and we need to return a full rank matrix.
Definition at line 99 of file core_dtqrcp.c.
References core_dpqrcp(), and pastix_int_t.
Referenced by core_dge2lr_tqrcp(), and core_drradd_tqrcp().
int core_dtradd  (  pastix_uplo_t  uplo, 
pastix_trans_t  trans,  
pastix_int_t  M,  
pastix_int_t  N,  
double  alpha,  
const double *  A,  
pastix_int_t  LDA,  
double  beta,  
double *  B,  
pastix_int_t  LDB  
) 
Add two triangular matrices together as in PBLAS pdtradd.
B < alpha * op(A) + beta * B,
where op(X) = X, X', or (X')
[in]  uplo  Specifies the shape of A and B matrices:

[in]  trans  Specifies whether the matrix A is nontransposed, transposed, or conjugate transposed

[in]  M  Number of rows of the matrices op(A) and B. 
[in]  N  Number of columns of the matrices op(A) and B. 
[in]  alpha  Scalar factor of A. 
[in]  A  Matrix of size LDAbyN, if trans = PastixNoTrans, LDAbyM otherwise. 
[in]  LDA  Leading dimension of the array A. LDA >= max(1,M) if trans = PastixNoTrans, LDA >= max(1,N) otherwise. 
[in]  beta  Scalar factor of B. 
[in,out]  B  Matrix of size LDBbyN. On exit, B = alpha * op(A) + beta * B 
[in]  LDB  Leading dimension of the array B. LDB >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
PastixLower
PastixUpper
Definition at line 79 of file core_dtradd.c.
References core_dgeadd(), PASTIX_SUCCESS, PastixLower, PastixNoTrans, PastixTrans, PastixUpper, and PastixUpperLower.
int core_dscalo  (  pastix_trans_t  trans, 
pastix_int_t  M,  
pastix_int_t  N,  
const double *  A,  
pastix_int_t  lda,  
const double *  D,  
pastix_int_t  ldd,  
double *  B,  
pastix_int_t  ldb  
) 
Scale a matrix by a diagonal out of place.
Perform the operation: B < op(A) * D, where A is a general matrix, and D a diagonal matrix.
[in]  trans 

[in]  M  Number of rows of the matrix B. Number of rows of the matrix A. 
[in]  N  Number of columns of the matrix B. Number of columns of the matrix A. 
[in]  A  Matrix of size ldabyN. 
[in]  lda  Leading dimension of the array A. lda >= max(1,M). 
[in]  D  Diagonal matrix of size lddbyN. 
[in]  ldd  Leading dimension of the array D. ldd >= 1. 
[in,out]  B  Matrix of size LDBbyN. 
[in]  ldb  Leading dimension of the array B. ldb >= max(1,M) 
PASTIX_SUCCESS  successful exit 
<0  if i, the ith argument had an illegal value 
1,not  yet implemented 
Definition at line 74 of file core_dscalo.c.
References pastix_int_t, PASTIX_SUCCESS, PastixNoTrans, and PastixTrans.
Referenced by cpublok_dscalo(), and cpucblk_dscalo().
pastix_fixdbl_t core_dlrorthu_fullqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  rank,  
double *  U,  
pastix_int_t  ldu,  
double *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the u part of the lowrank form, and update the v part accordingly using full QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
U = Q R, then U' = Q, and V' = R * V
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  rank  The number of columns of the U matrix, and the number of rows of the V part in the v1v2 matrix. 
[in,out]  U  The U matrix of size ldu by rank. On exit, Q from U = Q R. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, R * V, with R from U = Q R. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 83 of file core_dlrothu.c.
References pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_dlrorthu_partialqr  (  pastix_int_t  M, 
pastix_int_t  N,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
double *  U,  
pastix_int_t  ldu,  
double *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using partial QR.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M  The number of rows of the u1u2 matrix. 
[in]  N  The number of columns of the v1v2 matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 197 of file core_dlrothu.c.
References core_dlrdbg_check_orthogonality_AB(), pastix_int_t, and PastixLeft.
pastix_fixdbl_t core_dlrorthu_cgs  (  pastix_int_t  M1, 
pastix_int_t  N1,  
pastix_int_t  M2,  
pastix_int_t  N2,  
pastix_int_t  r1,  
pastix_int_t *  r2ptr,  
pastix_int_t  offx,  
pastix_int_t  offy,  
double *  U,  
pastix_int_t  ldu,  
double *  V,  
pastix_int_t  ldv  
) 
Try to orthognalize the U part of the lowrank form, and update the V part accordingly using CGS.
This function considers a lowrank matrix resulting from the addition of two matrices B += A, with A of smaller or equal size to B. The product has the form: U * V^t
The U part of the lowrank form must be orthognalized to get the smaller possible rank during the rradd operation. This function perfoms this by applying a full QR factorization on the U part.
In that case, it takes benefit from the fact that U = [ u1, u2 ], and V = [ v1, v2 ] with u2 and v2 wich are matrices of respective size M2byr2, and r2byN2, offset by offx and offy
The steps are:
[in]  M1  The number of rows of the U matrix. 
[in]  N1  The number of columns of the U matrix. 
[in]  M2  The number of rows of the u2 part of the U matrix. 
[in]  N2  The number of columns of the v2 part of the V matrix. 
[in]  r1  The number of columns of the U matrix in the u1 part, and the number of rows of the V part in the v1 part. 
[in,out]  r2ptr  The number of columns of the U matrix in the u2 part, and the number of rows of the V part in the v2 part. On exit, this rank is reduced y the number of null columns found in U. 
[in]  offx  The row offset of the matrix u2 in U. 
[in]  offy  The column offset of the matrix v2 in V. 
[in,out]  U  The U matrix of size ldu by rank. On exit, the orthogonalized U. 
[in]  ldu  The leading dimension of the U matrix. ldu >= max(1, M) 
[in,out]  V  The V matrix of size ldv by N. On exit, the updated V matrix. 
[in]  ldv  The leading dimension of the V matrix. ldv >= max(1, rank) 
Definition at line 430 of file core_dlrothu.c.
References core_dlrdbg_check_orthogonality_AB(), and pastix_int_t.
void core_dpotrfsp  (  pastix_int_t  n, 
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting Cholesky factorization of the matrix nbyn A = L * L^t .
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with Cholesky factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 144 of file core_dpotrfsp.c.
References core_dpotf2sp(), and pastix_int_t.
void core_dgetrfsp  (  pastix_int_t  n, 
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting LU factorization of the matrix mbyn A = L * U.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LU factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the number of pivots is incremented. 
Definition at line 138 of file core_dgetrfsp.c.
References core_dgetf2sp(), and pastix_int_t.
void core_dsytrfsp  (  pastix_int_t  n, 
double *  A,  
pastix_int_t  lda,  
pastix_int_t *  nbpivots,  
double  criterion  
) 
Compute the block static pivoting factorization of the symmetric matrix nbyn A such that A = L * D * L^t.
[in]  n  The number of rows and columns of the matrix A. 
[in,out]  A  The matrix A to factorize with LDL^t factorization. The matrix is of size lda by n. 
[in]  lda  The leading dimension of the matrix A. 
[in,out]  nbpivots  Pointer to the number of piovting operations made during factorization. It is updated during this call 
[in]  criterion  Threshold use for static pivoting. If diagonal value is under this threshold, its value is replaced by the threshold and the nu,ber of pivots is incremented. 
Definition at line 143 of file core_dsytrfsp.c.
References core_dsytf2sp(), and pastix_int_t.