Work on CoArray fortran
Home page This directory is about co-array fortran (CAF) parallel programming at MSI. Routines compiled and executed on XT5 Cray computer (Hopper at NERSC). |

(Click "Expand" to expand the tabs and access the routines.) Expand Collapse Notes and programs: - Notes by John Reid ISO/IEC JTC1/SC22/WG5 N1824-2
- User's manual
- A simple dot product where vectors are read from files
- A simple, read from files, dot product, preparing modules
- Modified Gram-Schmidt program for vectors read by each processor on separate files. towards a modular version (kind=dbl defined):
- Modules for Modified Gram-Schmidt, vectors read from files
- Modules for Modified Gram-Schmidt, vectors are random (independent of processors)
- A sparse matrix vector in co-array fortran, for a small (20X20) fixed size matrix on 4 processors; matrix uses derived type with variable size on each processor.
- script for submitting PBS job on Hopper
- dimensions.F08
- IO_module.F08
- lanczos_module.F08
- main_randoms.F08
- module_dot.F08
- physical.F08 contains the sparse CSR matrix, built by hand for now, on each processor.
- Makefile
- out (output file of execution; build matrix and does parallel matvec on a random vector)
- Version 0.0.0 of sparse Lanczos without reorthogonalization in co-array fortran, for a small (20X20) fixed size matrix on 4 processors, same as above with some cleaning up.
- script for submitting PBS job on Hopper or Franklin
- dimensions.F08
- output file for running job
- IO_module.F08
- physical.F08 The answer for the eigenvalues is written there (computed with LAPACK, or MPI-Lanczos)
- module_dot.F08
- lanczos_module.F08 This is the core of Lanczos routines, most likely too many "sync all"
- main_lanczos.F08 Basically the driver, for now matrix is very small, for tests
- Makefile
- Version 0.0.1 of sparse Lanczos with modified Gram-Schmidt in co-array fortran, for a small (20X20) fixed size matrix on 4 processors.
- script for submitting PBS job on Hopper or Franklin
- dimensions.F08
- GramSchmidt.F08
- IO_module.F08
- lanczos_module.F08 sparse_matvec is now in math_operations.F08.
- main_lanczos.F08The main routine
- Makefile
- math_operations.F08 sparse_matvec is now in this module, along with dot-product.
- physical.F08
- out output of run on Hopper
- Version 0.0.2 of sparse Lanczos with both classical and modified Gram-Schmidt in co-array fortran, for a small (20X20) fixed size matrix on 4 processors.
- script for submitting PBS job on Hopper or Franklin
- dimensions.F08
- GramSchmidt.F08
- IO_module.F08
- lanczos_module.F08 sparse_matvec is now in math_operations.F08.
- main_lanczos.F08The main routine
- Makefile
- math_operations.F08
- out
- physical.F08
- Sparse Lanczos with modified Gram-Schmidt full reorthogonalization, in co-array fortran, using sequential distributed-size expansion module: Version of June 13th, 2011
- script for submitting PBS job on Hopper
- INPUT.DAT file read by program (note that full-reortho is always performed
- out file on Hopper
- Makefile
- resulting eigenvalues
- dimensions.f90 [notice that ja now is a two variables index including proc(:)]
- dotproduct.f90 (include a binary tree based on IEOR, similar to MPI first generation)
- dse.f90 (entire module based on pARMS; this code is sequential, no parallel codes are available)
- gramschmidt.f90 (modified is used; classical is commented out)
- graph_partition.f90 (driver for distributed-size expansion method)
- lanczos.f90 (lanczos steps, with full re-ortho)
- input_output.f90 (for debugging purposes)
- main_NLC_lanczos.f90 (entire driver for the calculation of eigenvalues)
- matvec.f90 (This matrix-vector uses CSR format and takes advantage of communications using DSE)
- NLC_matrix.f90 (This is the Numerical-linked cluster builder of CSR matrix)
- partial_reortho.f90 (Partial re-ortho; bugs, will use Haw-ren Fang's version instead)
- permute.f90 (permutation of CSR matrix, sequential version)
- random_vector.f90
- complex GMRES code using simple matvec (diagonal) Version of June 17th, 2011
- out file on Hopper
- ZGMRES file for submitting a job
- INPUT.DAT file
- CSR_matvec.f90 module. in this version the matvec is simple diagonal (based on pARMS' example)
- dimensions.f90: prepares the CSR format, it basically uses only dbl format in this example
- dotproduct.f90: complex dot product with binary tree
- gmres.f90: complex GMRES program
- gramschmidt.f90: complex modified gram-schmidt
- main_zgmres.f90: main driver based on pARMS' example, now in CAF/F08
- Makefile for Hopper
- matrix_data.f90: contains some inputs, CSR construction, next: this is here.
- random_vector.f90: random vector (this is not the same, but similar to, Lanczos above)
- complex GMRES code using CSR matrix-vector operation Version of July 5th, 2011
- Makefile for Hopper
- out file on Hopper
- INPUT.DAT file, this contains the size of the 2D mesh
- ZGMRES file for submitting a job
- test_gmres_prec.f90: main driver for constructing complex matrix and calling ZGMRES
- random_vector.f90: random vector (not the same but similar to Lanczos above)
- matrix_data.f90: routine that builds a complex symmetric matrix, structure of IDMFT
- complex_gmres.f90: The main complex GMRES solver, diagonal preconditioner is ready
- dimensions.f90: CSR matrix is defined here using derived types
- gramschmidt.f90: Gram-Schmidt orthogonalization, in CAF
- dotproduct.f90: dot product with IEOR reduction in coarray fortran
- CSR_matvec.f90: CSR matvec that uses the derived type of CSR-CAF matrices
- Notes
- Tar.gz file is available here
- Matsubara frequencies are not (yet) "embarrasingly parallel" (next version).
- All matrices are distributed.
- Laplacian is 3D, but impurity is 2D, Fermionic-Bosonic Hamiltonian.
- Diagonal of inverse and 2D-IDMFT code for Fermi-Boson case. Version of Sept 21th, 2011
- Notes
- Tar.gz file is available here
- All matrices are distributed.
- Laplacian is 3D (as in version 1), now all variables are 3D now, using Fermionic-Bosonic Hamiltonian.
- Diagonal of inverse and 3D-IDMFT code for Fermi-Boson case. Version of Feb 16th, 2012
- complex_gmres.f90
- dimensions.f90
- dotproduct.f90
- fermiboson.f90
- gramschmidt.f90
- idmft_2d_CAF
- IDMFT_matrix.f90
- image_zgmres.f90
- INPUT.DAT
- lowranklanczos.f90
- main_3D_lowrankLanczos.f90
- Makefile
- matrixDDM.out
- matvec.f90
- out
- preconditioning.f90
- random_vector.f90
- SOLUTIONS =Solutions obtained after the 2 iterations defined in INPUT. Files are Greens_function.(Matsubara_frequency). First column is the matrix index, second, third, and fourth columns are the 3D cartesian index of the solution: For example, with gnuplot: "plot Greens_function.00 u 1:5" shows a plot of the Green's function solution using the natural ordering.
- Notes
- I keep the extension "
.f90 just because the vim editor does not recognize extensions such as f08 or even f03.
- I keep the extension "
- Diagonal of inverse and 3D-IDMFT code for Fermi-Boson case. Version of Feb 1st, 2011
- Makefile
- TEAM_Matsubara =PBS script for submitting job on Hopper or Franklin.
- main_3D_lowrankLanczos.f90 =main program that manages the IDMFT loop.
- dimensions.f90 =definitions of the derived types and explaination of matrix structure.
- laplace_stencil.f90 =Construction of the matrix, as well as its distribution per Matsubara blocks
- random_vector.f90 =Generates random initial vectors, may or may not be used
- dotproduct.f90 =Computes the dot product on each Matsubara group, binary tree has a sync problem, using sequential reduction for now
- fermiboson.f90 =Contains everything specific to Falicov-Kimball Fermi-Boson model, impurity, density, etc.
- input_output.f90 =Reads the input file and outputs the solutions
- image_zgmres.f90 =Does GMRES on each of the subdomains independently. This is B^-1 in the notes. There are no preconditioning in here, just reorthogonalization. Matrices are very small in size.
- matvec.f90 =Does the two required matvec, one per blocks for image_zgmres and the CAF_matvec of complex_gmres.
- preconditioning.f90 =This uses the B^-1 solution as a preconditioner, found by image_zgmres.
- gramschmidt.f90 =Gram schmidt for the complex_gmres. The Gram-Schmidt for Lanczos is directly done inside the Lanczos loop. Full reorthogonalization
- complex_gmres.f90 =Complex GMRES algorithm done on each Matsubara blocks (similar to MPI_COMM_Group, but using "sync images(Block_Matsubara)" instead.
- lowranklanczos.f90 =Low rank Lanczos based algorithm for finding the diagonal of the inverse of a complex symmetric matrix, entirely in CAF. Uses Freund's algorithm.
- INPUT.DAT =Input file for IDMFT and for Low-rank Lanczos code. There are very minimal internal parameters (maybe still 3 or 4 remaining in the source files) besides these. For example, image_zmgres.f90 does not print convergence, do "grep "(6" image_zmgres.f90" for example)
- out =Output obtained from a run on Hopper at NERSC. "grep beta out" shows low-rank-Lanczos convergence. "grep iteration= out" shows IDMFT convergence
- SOLUTIONS =Solutions obtained after the 5 iterations defined in INPUT. Files are Greens_function.(Matsubara_frequency). First column is the matrix index, second and third columns are the 2D cartesian index of the solution: For example, with gnuplot: "splot Greens_function.00 u 2:3:4" shows a plot of the Green's function solution.
- Notes
- I keep the extension "
.f90 just because the vim editor does not recognize extensions such as f08 or even f03. - A preconditioner derived type has been included. It is dense but should be used to replace it with a sparse preconditioner.
- each allocation has been measured and an output if the total memory is available at any step of the program. Search for variables ALLOC_REAL, ALLOC_CMPLX, or ALLOC_INT
- Works in 3D or in 2D for the hopping matrix.
- Falikov-Kimball model hamiltonian.
- I keep the extension "
- Diagonal of inverse and 3D-IDMFT code for Fermi-Boson case. Version of March 01, 2011, same as Version 1.1 above except that memory computation and some clean up is done
- Makefile
- idmft_3d_CAF =PBS script for submitting job on Hopper or Franklin.
- main_3D_lowrankLanczos.f90 =main program that manages the IDMFT loop.
- dimensions.f90 =definitions of the derived types and explaination of matrix structure.
- IDMFT_matrix.f90 =Construction of the matrix, as well as its distribution per Matsubara blocks
- random_vector.f90 =Generates random initial vectors, may or may not be used
- dotproduct.f90 =Computes the dot product on each Matsubara group, binary tree has a sync problem, using sequential reduction for now
- fermiboson.f90 =Contains everything specific to Falicov-Kimball Fermi-Boson model, impurity, density, etc.
- input_output.f90 =Reads the input file and outputs the solutions
- image_zgmres.f90 =Does GMRES on each of the subdomains independently. This is B^-1 in the notes. There are no preconditioning in here, just reorthogonalization. Matrices are very small in size.
- matvec.f90 =Does the two required matvec, one per blocks for image_zgmres and the CAF_matvec of complex_gmres.
- preconditioning.f90 =This uses the B^-1 solution as a preconditioner, found by image_zgmres.
- gramschmidt.f90 =Gram schmidt for the complex_gmres. The Gram-Schmidt for Lanczos is directly done inside the Lanczos loop. Full reorthogonalization
- complex_gmres.f90 =Complex GMRES algorithm done on each Matsubara blocks (similar to MPI_COMM_Group, but using "sync images(Block_Matsubara)" instead.
- lowranklanczos.f90 =Low rank Lanczos based algorithm for finding the diagonal of the inverse of a complex symmetric matrix, entirely in CAF. Uses Freund's algorithm.
- INPUT.DAT =Input file for IDMFT and for Low-rank Lanczos code. There are very minimal internal parameters (maybe still 3 or 4 remaining in the source files) besides these. For example, image_zmgres.f90 does not print convergence, do "grep "(6" image_zmgres.f90" for example)
- out =Output obtained from a run on Hopper at NERSC. "grep beta out" shows low-rank-Lanczos convergence. "grep iteration= out" shows IDMFT convergence
- SOLUTIONS =Solutions obtained after the 5 iterations defined in INPUT. Files are Greens_function.(Matsubara_frequency). First column is the matrix index, second and third columns are the 2D cartesian index of the solution: For example, with gnuplot: "splot Greens_function.00 u 2:3:4" shows a plot of the Green's function solution.
- Notes
- Modified structure of MATLAB code from Yousef Saad, for code development into fortran code CAF
- largest (COMPLETED)
- interior (COMPLETED)
- smallest (COMPLETED)
- Steps to build the coarray solver:
- Extract the Lanczos code from the IDMFT code (DONE, see below "Code to start with)
- clean up the matlab routine (DONE)
- Built a set of modules based on the matlab functions above (IN THE COOKING STOVE)
- There are common routines, that do not depends on the type of filter (start with that)
- The FiltConjugResid_PolyOnly and the largest case is the easiest to start with and check for convergence
- Once the largest case is done, the rest follows
- Modified structure of MATLAB code from Yousef Saad, for code development into fortran code CAF
- Code to start with: Laplacian All eigenvalues using domain decomposition. IDMFT code using a real Laplacian 7-pts stencil with domain decomposition of the matrix, shifted to [0,max_eigen].
- Notes
- Examples from Akin's book (no-coarrays yet; plain sequential)
- Starting from Akin's book, build parallel versions with CAF
Fortran 2008 examples:
Some more efficient versions of Lanczos.
GMRES codes in coarray fortran.
Low-rank Lanczos for fermionic-bosonic IDMFT in coarray fortran. Version 1 Low-rank Lanczos for fermionic-bosonic 3D IDMFT in coarray fortran. Version 1.1 (Feb 16, 2012) Version [Feb 1st, 2012] version of IDMFT with coarray fortran. Version embarassingly parallel and with domain decomposition (tar.gz file) LATEST Version 1.2 [March 1st, 2012] version of IDMFT with coarray fortran. Version with computation of memory used, valid for 3D and 2D (tar.gz file) Polynomial filtering algorithm Object-oriented co-array fortran
This web page is regularly updated Winter 2011.
See also Chapel. ------------------------------------------- |