My Community

Posted: **Sun Jun 25, 2023 2:07 pm**

Dear fellows,

I have been running VASP parallelized using MPI for a few months in a local computer cluster. I just installed VASP in an HPC facility with the "makefile.include.intel_omp" archetype to use MPI + OpenMP.

When running VASP as I do in my local cluster, for example, with "#PBS -l nodes=1:ppn=4", "mpirun -np 4 vasp_std", and "NCORE = 4" in the INCAR file I get no errors. When trying to run VASP parallelized using MPI + OpenMP with "#PBS -l nodes=1:ppn=4", "export OMP_NUM_THREADS=2", "mpirun -np 2 vasp_std" and "NCORE = 2", for instance, I get several warnings in the stdout that say "WARNING: Sub-Space-Matrix is not hermitian in DAV" and the final error "Error EDDDAV: Call to ZHEGV failed. Returncode = 15 2 16". I know that this error happens when the matrix is divided into an inappropriate number of cores and ends up being non-Hermitian, halting the diagonalization process. When I had this issue when running VASP parallelized using MPI only, it was simply solved by adjusting the variable "NCORE" on the INCAR file; however, I tried several combinations of "OMP_NUM_THREADS", "-np", and "NCORE" to solve this issue with no success.

The relevant files are attached below. Could someone help me with this?

Best regards,
Renan Lira.

Posted: **Sun Jun 25, 2023 3:17 pm**

Hi,

If I understand correctly, the problem occurs only with the new installation on the HPC facility, right? Does it occur systematically, independently of the solid that is considered or choices in INCAR like ALGO, ISMEAR or the functional (I see that you are using the Tkatchenko-Scheffler)?

Posted: **Sun Jun 25, 2023 9:57 pm**

Hello,

Yes, the problem only occurs with the new installation on the HPC facility, when trying to use VASP parallelized with MPI + OpenMP (I don't get errors when using only MPI).

I ran VASP for a different system using different parameters, such as ALGO=Normal, ISMEAR=0 and IVDW=202 (Many-body dispersion energy method), and VASP ran successfully. Which parameter do you think is the culprit? I will run a few more calculations changing one parameter at a time to try and isolate the problem.

Although it ran successfully, I realized that in the header of the stdout, instead of getting

running 4 mpi-ranks, with 1 threads/rank, on 1 nodes
distrk: each k-point on 4 cores, 1 groups
distr: one band on 4 cores, 1 groups

that I get when running MPI only, I get

running 2 mpi-ranks, with 2 threads/rank, on 1 nodes
distrk: each k-point on 2 cores, 1 groups
distr: one band on 1 cores, 2 groups

I can't get the band distribution to run on 2 cores, 1 group as the k-point distribution. What is wrong here?

Best regards,
Renan Lira.

Posted: **Mon Jun 26, 2023 9:47 am**

I have run your system in MPI+OpenMP mode (I also compiled with Intel 2022.0.1 and used makefile.include.intel_omp) and the calculation ran properly. There is maybe a problem with your installation.

Here forum/viewtopic.php?f=2&t=19026 you reported that you switched to a more recent version of the Intel compiler. Did you execute "make veryclean" before recompiling with the more recent compiler? If not, then please do it and try again to run your system.

Concerning your question "I can't get the band distribution to run on 2 cores, 1 group as the k-point distribution. What is wrong here?", this should be due to what is written at Combining_MPI_and_OpenMP:
"The main difference between the pure MPI and the hybrid MPI/OpenMP version of VASP is that the latter will not distribute a single Bloch orbital over multiple MPI ranks but will distribute the work on a single Bloch orbital over multiple OpenMP threads. "

Posted: **Tue Jun 27, 2023 2:28 pm**

Hello,

I have run your system in MPI+OpenMP mode (I also compiled with Intel 2022.0.1 and used makefile.include.intel_omp) and the calculation ran properly. There is maybe a problem with your installation.

After successfully running the different system I mentioned, I went back to my original system with ALGO=Normal, ISMEAR=0 and IVDW=202, but it did not work. Then, I returned these variables to ALGO=Fast, ISMEAR=-5 and IVDW=20 (as the file attached), but changed PREC=Accurate to PREC=Normal and it ran with no problems. What is going on here?

Did you execute "make veryclean" before recompiling with the more recent compiler?

Yes, I executed "make veryclean" before recompiling with the other compiler.

Thank you for the explanation regarding band parallelization on MPI + OpenMP mode.

Best regards,
Renan Lira.

Posted: **Tue Jun 27, 2023 2:36 pm**

Difficult to say what is the problem. Can you provide the files of the case that does not work?

Posted: **Tue Jun 27, 2023 2:40 pm**

I should mention that to install VASP, I used a makefile.include that was provided by the support team from the HPC facility. I attached it to this message.

The files I provided are for the case that does not work.

Posted: **Tue Jun 27, 2023 9:37 pm**

I could again run your system without any problem with the same setting as yours:
-vasp.6.4.1
-Compilation with the makefile.include that you provided (I only adapted the paths for HDF5 and WANNIER90)
-MPI+OpenMP mode: "mpirun -np 2 ~/vasp-release.6.4.1/bin/vasp_std" and OMP_NUM_THREADS=2

Which machines/processors are you using?

Posted: **Thu Jun 29, 2023 4:26 pm**

Hello,

I am using a Dell EMC R6525 which has 2 AMD EPYC 7662 processors.

Posted: **Thu Jun 29, 2023 7:45 pm**

There is something that if confusing. In a previous post you wrote:

After successfully running the different system I mentioned, I went back to my original system with ALGO=Normal, ISMEAR=0 and IVDW=202, but it did not work. Then, I returned these variables to ALGO=Fast, ISMEAR=-5 and IVDW=20 (as the file attached), but changed PREC=Accurate to PREC=Normal and it ran with no problems. What is going on here?

What does it mean? Does it mean that "ALGO=Fast, ISMEAR=-5 and IVDW=20" was originally producing the error, but was then later finally working?

I am using a Dell EMC R6525 which has 2 AMD EPYC 7662 processors.

This concerns the HPC facility where the error occurs, right?

Something that I somehow forgot, is that in general some options should be used when combining MPI and OpenMP, as indicated at Combining_MPI_and_OpenMP. In particular try the following:

Code: Select all

mpirun -np 2 --bind-to core --report-bindings --map-by ppr:2:node:PE=2 -x OMP_NUM_THREADS=2 -x OMP_PLACES=cores -x OMP_PROC_BIND=close pat_to_vasp_std

Posted: **Tue Jul 04, 2023 7:41 pm**

Thanks for the reply.

What does it mean? Does it mean that "ALGO=Fast, ISMEAR=-5 and IVDW=20" was originally producing the error, but was then later finally working?

ALGO=Fast, ISMEAR=-5 and IVDW=20 with PREC=accurate were causing errors. When I changed PREC to NORMAL it worked. However, I need high precision.

This concerns the HPC facility where the error occurs, right?

Yes, the EPYC 7662 processors are the ones on the HPC facility where I get the errors.

Something that I somehow forgot, is that in general some options should be used when combining MPI and OpenMP, as indicated at Combining_MPI_and_OpenMP. In particular try the following:
Code: Select all
mpirun -np 2 --bind-to core --report-bindings --map-by ppr:2:node:PE=2 -x OMP_NUM_THREADS=2 -x OMP_PLACES=cores -x OMP_PROC_BIND=close pat_to_vasp_std

I will try that as soon as possible and post the results.

Best regards,
Renan Lira.

My Community

Problem when using VASP parallelized using MPI + OpenMP

Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP

Re: Problem when using VASP parallelized using MPI + OpenMP