Page 1 of 1

Memory issues in large cell Wannierisations with SCDM method

Posted: Fri Dec 08, 2023 11:38 am
by PatrickJTaylor
Hi Folks,

I'm been calculating Wannier functions from VASP (6.4.2) using the Wannier90 interface recently and have now encountered a problem.

I am trying to utilise the SCDM method to obtain initial projections for a relatively large structure (577 bands to be Wannierised). Unfortunately, my calculations are running out of memory every time the SCDM procedure attempts to run i.e. the .mmn and .eig files are written correctly, but then VASP crashes and the system error files indicate a memory problem.

I have used the same inputs for smaller systems (i.e. same INCAR and POTCAR minus changes to NUM_WANN etc) with no such problems, which leads me to believe that there is no inherent problem with my methodology (I have attached an archive just in case).

Any advice as to how I can get these calculations to run would be much appreciated. It's worth noting that switching to the standard Wannier projection scheme (i.e. defining the initial projections via WANNIER90_WIN) does solve the problem, but I am particularly keen to get SCDM working for this system, as I suspect that a good initial guess for the projections will be hard to obtain from chemical intuition in this case.

Re: Memory issues in large cell Wannierisations with SCDM method

Posted: Mon Dec 11, 2023 10:02 am
by manuel_engel1
Hi Patrick,

From what I can tell, your assessment is absolutely correct. The code runs out of memory during the SCDM routine. Specifically, there is a rank-revealing QR decomposition that requires a lot of memory (total number of bands times number of plane waves). Unfortunately, this happens independently on every MPI rank, which is likely the cause of your memory problems. We are working on improving this situation in the future.

Sadly, there is not a lot you can do in terms of INCAR tags. However, you can try to reduce the number of MPI ranks that share the same physical memory. Preferably, you want to restart from an already converged calculation.

Example procedure:
  1. Converge your system with high accuracy, many cores and write the WAVECAR file.
  2. Start a new run with a lot fewer cores. Read the WAVECAR file, set NELM=1 (or even 0) and try to do SCDM.