Different total energy when running parallel version

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
slaubach
Newbie
Newbie
Posts: 5
Joined: Tue Nov 23, 2004 12:16 pm

Different total energy when running parallel version

#1 Post by slaubach » Tue Dec 05, 2006 9:17 am

I am not sure if should have posted my question in the Installation forum, but as the program is running only with a strange result I try it here.

I am finally able to run the parallel version of vasp yipeee,
but on my first tests doing a simple MgO calculation I found different total energies running the parallel version and the normal version.

My INCAR:

PREC = Accurate
ENCUT = 400.000
IALGO = 48
NELM = 60
NELMIN = 2
EDIFF = 1.0e-04
EDIFFG = -0.02
VOSKOWN = 1
NBLOCK = 1
ISPIN = 1
INIWAV = 1
ISTART = 0
ICHARG = 2
LWAVE = .FALSE.
LCHARG = .TRUE.
ADDGRID = .FALSE.
ISMEAR = -4
SIGMA = 0.2
LREAL = .FALSE.
RWIGS = 1.36 0.73
NEDOS = 1500
NPAR = 4

I als did the parallel calculation without the NPAR parameter with the same result.

The normal versions gives TOTEN: -11.100358 eV
The parallel version with one node: -9.857838 eV
With any other number of nodes: -9.722229 eV

What went wrong?
I would be really grateful for any help provided.
Last edited by slaubach on Tue Dec 05, 2006 9:17 am, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

Different total energy when running parallel version

#2 Post by admin » Tue Dec 05, 2006 2:27 pm

Most probably, NBANDS differed in both runs:
please note that the number of bands (including empty bands!) which are used for the electronic scf procedure is always an integer multiple of the number of nodes that you use in your parallel run.
A few empty bands however are needed to get the necessary varaitional freedom in the electronic scf-procedure (please have a look at the chapter Theoretical Background: Algorithms used in VASP to calculate the electronic groundstate for further information.)
---> set NBANDS in the serial run to the same value as was used in the parallel run. You can find out by grepping for NBANDS in OUTCAR. (this should also always be done if parallel runs are done on different #CPUs), then the results should be equal.
Last edited by admin on Tue Dec 05, 2006 2:27 pm, edited 1 time in total.

slaubach
Newbie
Newbie
Posts: 5
Joined: Tue Nov 23, 2004 12:16 pm

Different total energy when running parallel version

#3 Post by slaubach » Tue Dec 05, 2006 3:05 pm

First of all I have to tell you how much I appreciate the fast help one can get in this forum.
Thank you.

Sadly for my question NBANDS seems not to be the answer.
On my serial run I had NBANDS=11, on the parallel run NBANDS = 12 with more than 1 node but also NBANDS=11 when running the parallel version on 1 node.
So this can not be the reason for the different values for the total energy for the serial calcution and the parallel version running on 1 node.

But to be sure I restarted the serial run with NBANDS=12.
Here I get TOTEN = -10.537870 eV which is still a big difference to the -9.722229 from the parallel run.

So I am still lost.
Last edited by slaubach on Tue Dec 05, 2006 3:05 pm, edited 1 time in total.

tjf
Full Member
Full Member
Posts: 107
Joined: Wed Aug 10, 2005 1:30 pm
Location: Leiden, Netherlands

Different total energy when running parallel version

#4 Post by tjf » Thu Dec 07, 2006 9:56 am

In case my email got killed by a spam filter: You should post a POSCAR (ar at least your source for the structure) and I'll see what my builds produce.
Last edited by tjf on Thu Dec 07, 2006 9:56 am, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

Different total energy when running parallel version

#5 Post by admin » Tue Dec 12, 2006 2:01 pm

please check if everything (apart from the number of processors and serial/parallel executables) is consistent: this includes
--) all input parameters (of course)
--) the code version (of course)
--) the compilers and libraries (this includes not only the compiler itself, but also the degree of optimization and other compilation parameters you use).
--) possibly, the processor types (very seldom, processors may be defective themselves, producing slightly deviating results)
Last edited by admin on Tue Dec 12, 2006 2:01 pm, edited 1 time in total.

Post Reply