serial VASP works fine, parallel VASP doesn't

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
hurbina
Newbie
Newbie
Posts: 1
Joined: Wed Oct 29, 2008 4:25 pm

serial VASP works fine, parallel VASP doesn't

#1 Post by hurbina » Wed Oct 29, 2008 7:15 pm

Hi all,

We have recently compiled vasp on a IA64 machine, SUSE Linux Enterprise Server 9, using intel (C++/FORTRAN) compilers (v10.1.0.18), intel Math Kernel Library (v10.0.5.025) and, for parallel version of VASP, intel MPI libraries v3.0

We succesfully have compiled both versions of VASP, but only the serial version works well with our test:

Code: Select all

francisco@ihn:~/nfs/test> vasp
 vasp.4.6.31 08Feb07 complex 
 POSCAR found :  1 types and    1 ions
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 WARNING: wrap around errors must be expected
 FFT: planning ...           1
 reading WAVECAR
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.391565884611E+02    0.39157E+02   -0.95953E+02    14   0.335E+02
DAV:   2     0.394990214392E+01   -0.35207E+02   -0.34385E+02    28   0.480E+01
DAV:   3    -0.158309924777E+00   -0.41082E+01   -0.39042E+01    14   0.376E+01
DAV:   4    -0.310260836369E+00   -0.15195E+00   -0.13836E+00    14   0.660E+00
DAV:   5    -0.313215188806E+00   -0.29544E-02   -0.29502E-02    28   0.907E-01    0.286E-01
DAV:   6    -0.314079169947E+00   -0.86398E-03   -0.18768E-03    14   0.397E-01    0.142E-01
DAV:   7    -0.314221165203E+00   -0.14200E-03   -0.21863E-04    14   0.149E-01    0.480E-02
DAV:   8    -0.314276237512E+00   -0.55072E-04   -0.26454E-05    14   0.469E-02
   1 F= -.31427624E+00 E0= -.16001392E+00  d E =-.308525E+00
 writing wavefunctions
francisco@ihn:~/nfs/test>


while the parallel version gives this error:

Code: Select all

francisco@ihn:~/nfs/test> vasp_parallel 
 running on    1 nodes
 distr:  one band on    1 nodes,    1 groups
 vasp.4.6.31 08Feb07 complex 
 POSCAR found :  1 types and    1 ions
 scaLAPACK will be used
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 WARNING: wrap around errors must be expected
 FFT: planning ...           1
vasp_parallel(30034): unaligned access to 0x60000ffffffed7a4, ip=0x4000000000713e60
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_parallel      40000000007143E0  Unknown               Unknown  Unknown
vasp_parallel      4000000000713B30  Unknown               Unknown  Unknown
vasp_parallel      40000000006CCDB0  Unknown               Unknown  Unknown
vasp_parallel      40000000006CE9B0  Unknown               Unknown  Unknown
vasp_parallel      40000000000667B0  Unknown               Unknown  Unknown
vasp_parallel      4000000000007790  Unknown               Unknown  Unknown
libc.so.6.1        20000000014CDC50  Unknown               Unknown  Unknown
vasp_parallel      4000000000007580  Unknown               Unknown  Unknown
francisco@ihn:~/nfs/test> 
If we try to run the parallel version with the 4 nodes, we get a similar error

Code: Select all

francisco@ihn:~/nfs/test> mpirun -machinefile nodelist -n 4 vasp_parallel
 running on    4 nodes
 distr:  one band on    1 nodes,    4 groups
 vasp.4.6.31 08Feb07 complex 
 POSCAR found :  1 types and    1 ions
 scaLAPACK will be used
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 WARNING: wrap around errors must be expected
 FFT: planning ...           1
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_parallel      40000000007143E0  Unknown               Unknown  Unknown
vasp_parallel      4000000000713B30  Unknown               Unknown  Unknown
vasp_parallel      40000000006CCDB0  Unknown               Unknown  Unknown
vasp_parallel      40000000006CE9B0  Unknown               Unknown  Unknown
vasp_parallel      40000000000667B0  Unknown               Unknown  Unknown
vasp_parallel      4000000000007790  Unknown               Unknown  Unknown
libc.so.6.1        20000000014CDC50  Unknown               Unknown  Unknown
vasp_parallel      4000000000007580  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_parallel      40000000007143E0  Unknown               Unknown  Unknown
vasp_parallel      4000000000713B30  Unknown               Unknown  Unknown
vasp_parallel      40000000006CCDB0  Unknown               Unknown  Unknown
vasp_parallel      40000000006CE9B0  Unknown               Unknown  Unknown
vasp_parallel      40000000000667B0  Unknown               Unknown  Unknown
vasp_parallel      4000000000007790  Unknown               Unknown  Unknown
libc.so.6.1        20000000014CDC50  Unknown               Unknown  Unknown
vasp_parallel      4000000000007580  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_parallel      40000000007143E0  Unknown               Unknown  Unknown
vasp_parallel      4000000000713B30  Unknown               Unknown  Unknown
vasp_parallel      40000000006CCDB0  Unknown               Unknown  Unknown
vasp_parallel      40000000006CE9B0  Unknown               Unknown  Unknown
vasp_parallel      40000000000667B0  Unknown               Unknown  Unknown
vasp_parallel      4000000000007790  Unknown               Unknown  Unknown
libc.so.6.1        20000000014CDC50  Unknown               Unknown  Unknown
vasp_parallel      4000000000007580  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
vasp_parallel      40000000007143E0  Unknown               Unknown  Unknown
vasp_parallel      4000000000713B30  Unknown               Unknown  Unknown
vasp_parallel      40000000006CCDB0  Unknown               Unknown  Unknown
vasp_parallel      40000000006CE9B0  Unknown               Unknown  Unknown
vasp_parallel      40000000000667B0  Unknown               Unknown  Unknown
vasp_parallel      4000000000007790  Unknown               Unknown  Unknown
libc.so.6.1        20000000014CDC50  Unknown               Unknown  Unknown
vasp_parallel      4000000000007580  Unknown               Unknown  Unknown
rank 2 in job 22  ihn_56881   caused collective abort of all ranks
  exit status of rank 2: return code 174 
francisco@ihn:~/nfs/test> 
(where nodelist points to ihn 4 times)

I include the parallel VASP Makefile:

Code: Select all

.SUFFIXES: .inc .f .f90 .F
SUFFIX=.f90

CPP_ =  ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

FFLAGS= -I/opt/intel/mkl/10.0.5.025/include/fftw -FR -lower_case

OFLAG=-O3 -xW -tpp7

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =

OBJ_NOOPT =
DEBUG  = -FR -O0
INLINE = $(OFLAG)

BLAS=-L/opt/intel/mkl/10.0.5.025/lib/64 -lmkl_intel_lp64 -lmkl_blacs_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

LAPACK=-L/opt/intel/mkl/10.0.5.025/lib/64 -lmkl_intel_lp64 -lmkl_blacs_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

LIB  = -L../vasp.4.lib -ldmy \
     ../vasp.4.lib/linpack_double.o $(LAPACK) \
     $(BLAS)

LINK    =

FFT3D= fftmpiw.o fftmpi_map.o fft3dlib.o /opt/intel/mkl/10.0.5.025/lib/64/libfftw3xf_intel.a

# MPI section, uncomment the following lines

FC=mpiifort
FCL=$(FC)

CPP    = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC \
     -Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
     -DMPI_BLOCK=500  \
     -DRPROMU_DGEMV  -DRACCMU_DGEMV -DscaLAPACK

SCA=/opt/intel/mkl/10.0.5.025/lib/64/libmkl_scalapack_lp64.a /opt/intel/mkl/10.0.5.025/lib/64/libmkl_blacs_intelmpi_lp64.a

LIB     = -L../vasp.4.lib -ldmy  \
      ../vasp.4.lib/linpack_double.o $(SCA) $(LAPACK) $(BLAS)

BASIC=   symmetry.o symlib.o   lattlib.o  random.o

SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o      setex.o     radial.o  \
         pseudo.o   mgrid.o    mkpoints.o wave.o      wave_mpi.o  $(BASIC) \
         nonl.o     nonlr.o    dfast.o    choleski2.o    \
         mix.o      charge.o   xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         metagga.o  constrmag.o pot.o      cl_shift.o force.o    dos.o      elf.o      \
         tet.o      hamil.o    steep.o    \
         chain.o    dyna.o     relativistic.o LDApU.o sphpro.o  paw.o   us.o \
         ebs.o      wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         dipol.o    xclib.o    chgloc.o   subrot.o   optreal.o   davidson.o \
         edtest.o   electron.o shm.o      pardens.o  paircorrection.o \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o \
         elpol.o    setlocalpp.o aedens.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
        rm -f vasp
        $(FCL) -o vasp $(LINK) main.o  $(SOURCE)   $(FFT3D) $(LIB) 
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
        $(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
        $(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
        $(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB) 
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
        $(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
        $(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:  
        -rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
        $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
        $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
        $(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
        $(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F

base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
        $(CPP)
        $(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
        $(CPP)
        $(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
        $(CPP)
        $(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
        $(CPP)
        $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
        $(CPP)
$(SUFFIX).o:
        $(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

fft3dlib.o : fft3dlib.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -tpp7 -xW -prefetch- -unroll0 -vec_report3 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
        $(CPP)
        $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
        $(CPP)
        $(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
        $(CPP)
        $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

LDApU.o : LDApU.F
        $(CPP)
        $(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
Any ideas?
<span class='smallblacktext'>[ Edited ]</span>
Last edited by hurbina on Wed Oct 29, 2008 7:15 pm, edited 1 time in total.

support_vasp
Global Moderator
Global Moderator
Posts: 1817
Joined: Mon Nov 18, 2019 11:00 am

Re: serial VASP works fine, parallel VASP doesn't

#2 Post by support_vasp » Wed Sep 04, 2024 12:15 pm

Hi,

We're sorry that we didn’t answer your question. This does not live up to the quality of support that we aim to provide. The team has since expanded. If we can still help with your problem, please ask again in a new post, linking to this one, and we will answer as quickly as possible.

Best wishes,

VASP


Locked