Page 1 of 1
G0W0R calculation crashing
Posted: Tue May 28, 2024 2:09 pm
by bprobinson102
VASP team,
I am currently doing some benchmarking using low-scaling GW (G0W0R) due to my interest in finite-temperature properties. I have tested k-point meshes of 5x5x5, 7x7x7, and 9x9x9 as well as ENCUT values of 300, 500, and 800 eV.
I am currently having issues with the calculation attached, where I am using a 9x9x9 k-point mesh with ENCUT=800 eV. All prior calculations have ran fine. I believe the error may be memory related, however, I am not entirely sure.
Any thoughts/input would be great. If you have any further questions or need more information please let me know.
Brian
Re: G0W0R calculation crashing
Posted: Tue May 28, 2024 2:57 pm
by henrique_miranda
Maybe you can grep for the memory usage in the calculations that finished:
grep 'memory' OUTCAR
Like that, you might get an idea of the amount of memory you are actually using.
In the `gw.OUTCAR` that you shared, it looks like the code crashes before writing the memory usage of the GW calculation.
Re: G0W0R calculation crashing
Posted: Tue May 28, 2024 10:28 pm
by bprobinson102
I would expect the memory to be about 230 GB/rank when NTAUPAR=8, which is too much. However, if NTAUPAR=2, it should be closer to 63 GB/rank, which should be enough. However, I still get the same error.
Re: G0W0R calculation crashing
Posted: Wed May 29, 2024 8:00 am
by henrique_miranda
I think this is a memory problem.
I tried to grep for memory in the OUTCAR you shared, but I get:
Code: Select all
$ grep memory gw.OUTCAR
total amount of memory used by VASP MPI-rank0 47703. kBytes
available memory per node: 106.78 GB, setting MAXMEM to 109337
total amount of memory used by VASP MPI-rank0 226674. kBytes
This is still an incomplete OUTCAR file. The memory estimation is done afterward.
My suggestion to try and get a better idea of the memory usage is to reduce the size of the calculation and grep for memory.
For example, I modified your files so that I can run on my local machine: use the default ENCUT=308.450 and KPOINTS to a 3x3x3 mesh.
Then I grep for memory:
Code: Select all
$ grep memory OUTCAR
total amount of memory used by VASP MPI-rank0 31753. kBytes
available memory per node: 6.60 GB, setting MAXMEM to 6756
total amount of memory used by VASP MPI-rank0 35056. kBytes
estimated memory requirement per rank 1371.8 MB, per node 5487.3 MB
memory high mark on MPI-rank0 inside Response functions allocated 85927. kBytes
memory high mark on MPI-rank0 inside RESPONSE_SUPER 1103737. kBytes
memory high mark on MPI-rank0 inside RESPONSE_SUPER 154051. kBytes
memory high mark on MPI-rank0 inside SIGMA_SUPER 1293329. kBytes
memory high mark on MPI-rank0 inside SIGMA_SUPER 1131900. kBytes
memory high mark on MPI-rank0 inside SIGMA_SUPER 1314821. kBytes
memory high mark on MPI-rank0 inside SIGMA_SUPER 1153392. kBytes
total amount of memory used by VASP MPI-rank0 83410. kBytes
total amount of memory used by VASP MPI-rank0 31775. kBytes
Maximum memory used (kb): 1481000.
Average memory used (kb): N/A
In this case, I need about 5GB of memory per node.
I monitored the calculation with htop and indeed that was an accurate estimation.
Now you've mentioned that you ran some calculations with lower k-point meshes and ENCUT, do those calculations crash as well?
If you grep for the memory usage of those calculations, what do you get?
Re: G0W0R calculation crashing
Posted: Wed May 29, 2024 3:21 pm
by bprobinson102
All other calculations ran successfully, with the memory requirements below.
kpoint, ENCUT: mem/node
5x5x5, 300 eV: 6 GB
5x5x5, 500 eV: 20 GB
5x5x5, 800 eV: 35 GB
9x9x9, 300 eV: 34 GB
9x9x9, 500 eV: 57 GB
Thanks,
Brian
Re: G0W0R calculation crashing
Posted: Fri May 31, 2024 2:14 pm
by bprobinson102
Adding to the list,
kpoint, ENCUT: mem/node
9x9x9, 600 eV: 62 GB
9x9x9, 700 eV: 81 GB
9x9x9, 750 eV: 92 GB
Re: G0W0R calculation crashing
Posted: Fri May 31, 2024 2:46 pm
by henrique_miranda
Indeed, there seems to be a problem in a routine that should return an error, and instead it is accessing out-of-memory leading to a segmentation fault.
This is not a memory issue but a bug.
I fixed this in the code in scala.F:
Code: Select all
- IF ( .NOT. PRESENT( INFO ) ) THEN
+ IF ( PRESENT( INFO ) ) THEN
Then I get the following error message:
Code: Select all
-----------------------------------------------------------------------------
| |
| EEEEEEE RRRRRR RRRRRR OOOOOOO RRRRRR ### ### ### |
| E R R R R O O R R ### ### ### |
| E R R R R O O R R ### ### ### |
| EEEEE RRRRRR RRRRRR O O RRRRRR # # # |
| E R R R R O O R R |
| E R R R R O O R R ### ### ### |
| EEEEEEE R R R R OOOOOOO R R ### ### ### |
| |
| One or more MPI groups includes at least one CPU that does not |
| carry data. This would cause ScaLAPACK to crash and is due to the |
| automatically selected processor grid for ScaLAPACK. |
| You can influence the processor grid (NPROW, NPCOL) by changing |
| NTAUPAR or NBANDS or both such that (NPCOL/NTAUPAR-1)*MB < NBANDS |
| as well as (NPROW/NTAUPAR-1)*MB < NBANDS. Values for this run are: |
| NTAUPAR = 8 |
| NBANDS = 256 |
| NPCOL = 4 |
| NPROW = 8 |
| MB = 64 |
| |
| ----> I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <---- |
| |
-----------------------------------------------------------------------------
Which in your case might be different.
You might be able to fix your calculation by changing NBANDS or NTAUPAR.
Regarding NBANDS my suggestion would be to include all the possible bands by inspecting the OUTCAR file:
Code: Select all
$ grep "maximum number of plane-waves:" OUTCAR
You will see that this number increases with ENCUT so instead of converging ENCUT and NBANDS separately you could simply converge ENCUT and always use the maximum number of bands.
Of course, when you include more bands the memory requirements will increase as well, so perhaps start by running with a ENCUT but a larger number of bands.
Hope this helps and thank you for brining this bug to my attention!
Re: G0W0R calculation crashing
Posted: Mon Jun 03, 2024 2:42 pm
by bprobinson102
Thanks for catching that, I will check out your suggestions!