FABuLOuS numericals results
Table of Contents
- 1. Results Summary
- 2. Setup workstation
- 3. Get Test files
- 4. Cleanup results directory
- 5. Run Examples
- 5.1. Convergence
- 5.1.1. Influence of restart(m) parameter
- 5.1.2. young1c (nrhs=6, m=90, k=5)
- 5.1.3. Influence of deflated restart (k) parameter (nrhs=6, m=90)
- 5.1.4. qr ib dr
- 5.1.5. young1c chameleon comparison QR (nrhs=6, m=90, k=5) (DEPRECATED) chameleon
- 5.1.6. GCR first results nrhs=10 m=300
- 5.1.7. BCG results
- 5.1.8. IB maxkeptdirection parameter
- 5.1.9. BCG results 2
- 5.1.10. BGCRO results check 3 consecutive solve
- 5.2. Timings
- 5.1. Convergence
Go back to README.html.
1. Results Summary
2. Setup workstation
To be able to run the following test cases, you must have compiled fabulous:
I recommend using guix as it is quite simple Setup the build system: Eventually compile fabulous:
3. Get Test files
4. Cleanup results directory
5. Run Examples
5.1. Convergence
5.1.1. Influence of restart(m) parameter
5.1.2. young1c (nrhs=6, m=90, k=5)
- run test case
- plot the graphic
In this experience, I try to mimic the parameters of section 4.2, example 8. The experience results are different because the right hand side are generated randomly with different random number generator, the kernels used may be different and some algorithms may differ a little.
For IB versions, the algorithms do differ: the experience presented here implement an extension computing 'Inexact breakdown on R0', while the result presented in the article do not (Figure 2, Example 8, left side)
5.1.5. young1c chameleon comparison QR (nrhs=6, m=90, k=5) (DEPRECATED) chameleon
5.1.6. GCR first results nrhs=10 m=300
5.1.7. BCG results
5.1.9. BCG results 2
5.2. Timings
5.2.1. Influence of incremental QR factorization
- run test case
- plot the graphic
The curve for "fullgels facto" is zero because there is no factorization part in the "fullgels". But even if it may be unexpected, the curve for "incrementalgels solve" is also zero because since GMRES is not a short term recurrence algorithm, computing the complete least square solution is not mandatory at each iteration. Only the residual norm is needed is this can be obtained from the incremental factorization without computing the actual residual
5.2.2. qr ib dr timing
5.2.3. with chameleon (DEPRECATED) chameleon
5.2.4. big matrix test 'perf1' parade
- Description
In this section they may have big test cases which may not work if your machine do not have enough memory.
Reminder:
- "-p" is number of right hand sides
- "-m" is maximum krylov space
- "-M" is maximum matrix vector product
"BAD" matrices are matrices that theorically have a bad convergence:
- real case: they are tridiagonal with 4 on diagonal, -1.0 under the diagonal and -2.99 over the diagonal
- complex case: diag = 4 - 2i, sub-diag: -2.99+i and over-diag: -1+i.
There is two kinds of test:
MAX_KRYLOV_SPACE
= 2000 ANDDIMENSION
= 5000 andMAX_KRYLOV_SPACE
= 5000 ANDDIMENSION
= 20000MAX_MVP
is set to a very big value in order that all test reach convergence (100000)The
ORTHOGONALIZATION_SCHEME
used is CGS (the cheapest)There are two kind of graphic:
- Time or percentage of time of certain steps in a iteration with respect to size of the krylov space. (therefore data from different restart are grouped together by krylov space size)
- Time of steps of total time of iterations with respect to the number of the current iteration in the globality of the algorithm
- Batch script
#SBATCH --job-name=fabulous_perf1 #SBATCH --output=fabulous_perf1_out_1 #SBATCH --error=fabulous_perf1_err_1 #SBATCH --exclusive #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=32 #SBATCH --time=04:00:00 #SBATCH --partition=routage #SBATCH --constrain=bora #source /home/${SBATCH_ACCOUNT}/.bashrc #source /home/${SBATCH_ACCOUNT}/fabulous/.plafrim_module_used #export STARPU_FXT_PREFIX=/home/tmijieux/fabulous/build/ #export STARPU_GENERATE_TRACE=1 FABULOUS_DIR=/home/msimonin/Repositories/fabulous cd ${FABULOUS_DIR}/build/ GUIX_ENV=--pure fabulous --with-source=fabulous=${FABULOUS_DIR} guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 5000 -m 2000 -M 100000 -s CGS -o perf1_std_1 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 5000 -m 2000 -M 100000 -s CGS -A QR -o perf1_qr_1 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 5000 -m 2000 -M 100000 -s CGS -A IB -o perf1_ib_1 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 5000 -m 2000 -M 100000 -s CGS -A QRDR -k 20 -r DEFLATED -o perf1_qrdr_1 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 5000 -m 2000 -M 100000 -s CGS -A QRIBDR -o perf1_qrib_1 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 20000 -m 5000 -M 100000 -s CGS -A IB -o perf1_ib_2 -x guix environment ${GUIX_ENV} -- ./src/test/cmd/fabulous_test -S BAD -l BAD -n 20000 -m 5000 -M 100000 -s CGS -A QRIBDR -o perf1_qrib_2 -x #./fabulous_test_cham -S BAD -l BAD -n 20000 -m 5000 -M 100000 -s CGS -A CHAMIB -o perf1_cham_ib_2 -x tar czvf perf1_results.tar.gz *.res *.kernel.txt mv perf1_results.tar.gz /home/$SBATCH_ACCOUNT
- IB and QR-IB
- runs
- graphic
- little
Here
ib_1
is the INEXACT BREAKDOWN variant with full least square version whileqrib_1
is the version with IB with incremental factorizationThe test case ending with _1 have the following noticeable parameters: Problem/Vector Size = 5000 Maximum size of Krylov space = 2000
gels perf1_ib_1
is copy of the hessenberg + the full least square kernel call.facto perf1_ib_1
correspond to nothing (that would be factorization part, therefore is null)facto perf1_qrib_1
is the incremental factorization with factorization of last block line and last block columngels perf1_qrib_1
is the solve in the QRIB case that contains:- a copy of the hessenberg,
- a piece of factorization with last line (TSQRT L over H)
- applying all factorizations update on right hand sides (because of double update: IB and incremental QR)
- eventually the 'solve' (triangular trsm kernel)
On this first graph is represented the percentage of time in an iteration that the different part presented before takes knowing that, the other important part of an iteration in this case are the matrix vector multiplication (rather constant among iteration) and the basis orthogonalization step whose length depend on krylov space size(abscissa).
On this second graph there are the raw length of each steps
On the third graph there are the cumulated times of each iterations. For IB and QRinc-IB version
- less little
Same thing with a bigger test case Vector/Problem size = 20000 Maximum size of Krylov Space Size = 5000
- little
- runs
- QR and Restarting
- IB and CHAM-IB (DEPRECATED) chameleon
The tests in this section compare badly against each other because part that does not use chameleon in the executable linked with chameleon does not profit from multi-threaded kernels (as chameleon requires to be linked against sequential blas. See [[ile:NOTES.org] Section "linking with lapacke/cblas kernels"]]
- runs
perf1_ib_2
from less little - graphic
- runs
- Kernel parameters analysis
- parade REMOTE
Setup connection with plafrim
export TERM=xterm hostname echo $WORKDIR
Launch the experiments
cd ${WORKDIR} #git pull origin develop emacs -batch --load ~/.emacs.d/init.el RESULTS.org --funcall org-babel-tangle sbatch scripts/batch_perf1.sh squeue -u tmijieux
- parade RETRIEVE LOCAL