Page 1 of 1

VASP 4.6 refuses to run on more then one host

Posted: Mon Apr 05, 2010 12:34 pm
by karl.vollmer
When running the following on 8 core boxes (Should cause Vasp to span 2 physical compute nodes)

Code: Select all

mpirun -v -np 16 -machinefile $TMPDIR/machines /share/apps/vasp/4.6/vasp > v.out
I always end up with 16 processes on the first node and 0 on the second. Compiled with GFortran. There are no errors during compliation and the output file appears to correctly indicate that vasp is running in parallel. This fails when running mpirun directly and when using SGE version: GE 6.2u4. MPI on the machine has been verified using the MPIRING and transfering 500Mb between multiple nodes without error.

I haven't worried about any optomizations yet. Step one is to get it running then I'll worry about making it faster. Any guidance would be appreciated. I've run though these forums and can't find any references to similar issues.

::Environment info::
output from vasp

Code: Select all

 running on   16 nodes
 distr:  one band on    1 nodes,   16 groups
 vasp.4.6.36 17Feb09 complex 
linked libs

Code: Select all

ldd vasp
        liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00002b1c33292000)
        libblas.so.3 => /usr/lib64/libblas.so.3 (0x00002b1c3399b000)
        libmpi.so.0 => /opt/SUNWhpc/HPC8.2.1/gnu/lib/lib64/libmpi.so.0 (0x00002b1c33bef000)
        libopen-rte.so.0 => /opt/SUNWhpc/HPC8.2.1/gnu/lib/lib64/libopen-rte.so.0 (0x00002b1c33d96000)
        libopen-pal.so.0 => /opt/SUNWhpc/HPC8.2.1/gnu/lib/lib64/libopen-pal.so.0 (0x00002b1c33ee3000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x000000380f600000)
        librt.so.1 => /lib64/librt.so.1 (0x000000380d600000)
        libgfortran.so.1 => /usr/lib64/libgfortran.so.1 (0x00002b1c3405b000)
        libm.so.6 => /lib64/libm.so.6 (0x000000380c600000)
        libdl.so.2 => /lib64/libdl.so.2 (0x000000380ca00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x000000381a400000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000000380ce00000)
        libmpi_f77.so.0 => /opt/SUNWhpc/HPC8.2.1/gnu/lib/lib64/libmpi_f77.so.0 (0x00002b1c342f4000)
        libmpi_f90.so.0 => /opt/SUNWhpc/HPC8.2.1/gnu/lib/lib64/libmpi_f90.so.0 (0x00002b1c34427000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003819c00000)
        libc.so.6 => /lib64/libc.so.6 (0x000000380c200000)
        /lib64/ld-linux-x86-64.so.2 (0x000000380be00000)
makefile

Code: Select all

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for gf90 compiler
# This makefile has not been tested by the vasp crew. 
# It is supplied as is.
#-----------------------------------------------------------------------
#
# Mind that some Linux distributions (Suse 6.1) have a bug in 
# libm causing small errors in the error-function (total energy
# is therefore wrong by about 1meV/atom). The recommended
# solution is to update libc.
#
# Mind that some Linux distributions (Suse 6.1) have a bug in
# libm causing small errors in the error-function (total energy
# is therefore wrong by about 1meV/atom). The recommended
# solution is to update libc.
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
#   retrieve the lapackage from ftp.netlib.org
#   and compile the blas routines (BLAS/SRC directory)
#   please use g77 or f77 for the compilation. When I tried to
#   use pgf77 or pgf90 for BLAS, VASP hang up when calling
#   ZHEEV  (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#   for a list of optimized BLAS try
#     http://www.kachinatech.com/~hjjou/scilib/opt_blas.html
#
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
#     http://developer.intel.com/software/products/mkl/
#   this is really excellent when you use Intel CPU's
#
# 3b) or obtain the atlas based BLAS routines
#     http://math-atlas.sourceforge.net/
#   you certainly need atlas on the Athlon, since the  mkl
#   routines are not optimal on the Athlon.
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f 
SUFFIX=.f

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=gfortran
# fortran linker
FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
#  CPP_   =  /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C 
#
# that's probably the right line for some Red Hat distribution:
#
#  CPP_   =  /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
#  SUSE 6.X, maybe some Red Hat distributions:

CPP_ =  ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# possible options for CPP:
# NGXhalf             charge density   reduced in X direction
# wNGXhalf            gamma point only reduced in X direction
# avoidalloc          avoid ALLOCATE if possible
# IFC                 work around some IFC bugs
# CACHE_SIZE          1000 for PII,PIII, 5000 for Athlon, 8000 P4
# RPROMU_DGEMV        use DGEMV instead of DGEMM in RPRO (usually  faster)
# RACCMU_DGEMV        use DGEMV instead of DGEMM in RACC (faster on P4)
#  **** definitely use -DRACCMU_DGEMV if you use the mkl library
#-----------------------------------------------------------------------

CPP    = $(CPP_) -DHOST=\"LinuxGfortran\" \
          -Dkind8 -DNGXhalf -DCACHE_SIZE=8000 -DGfortran -Davoidalloc \
          -DRPROMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags  (there must a trailing blank on this line)
# the -Mx,119,0x200000 is required if you use older pgf90 versions
# on a more recent LINUX installation
# the option will not do any harm on other 3.X pgf90 distributions
#-----------------------------------------------------------------------

FFLAGS =  -ffree-form -ffree-line-length-none

#-----------------------------------------------------------------------
# optimization,
# we have tested whether higher optimisation improves
# the performance, and found no improvements with -O3-5 or -fast
# (even on Athlon system, Athlon specific optimistation worsens performance)
#-----------------------------------------------------------------------

OFLAG  = -O2

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG  = -g -O0
INLINE = $(OFLAG)
#-----------------------------------------------------------------------
# the following lines specify the position of BLAS  and LAPACK
# what you chose is very system dependent
# P4: VASP works fastest with Intels mkl performance library
# Athlon: Atlas based BLAS are presently the fastest
# P3: no clue
#-----------------------------------------------------------------------

# Atlas based libraries
ATLASHOME= /usr/local/atlas/lib
#BLAS=   -L/usr/local/atlas/lib -lblas
BLAS=   -L$(ATLASHOME)  -lf77blas -latlas

# use specific libraries (default library path points to other libraries)
BLAS= $(ATLASHOME)/libf77blas.a $(ATLASHOME)/libatlas.a

# use the mkl Intel libraries for p4 (www.intel.com)
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4  -lpthread

# LAPACK, simplest use vasp.4.lib/lapack_double
LAPACK= ../vasp.4.lib/lapack_double.o

# use atlas optimized part of lapack
#LAPACK= ../vasp.4.lib/lapack_atlas.o  -llapack -lblas

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack

#LAPACK= -L/usr/local/atlas/lib -llapack

#-----------------------------------------------------------------------

LIB  = -L../vasp.4.lib -ldmy \
     ../vasp.4.lib/linpack_double.o $(LAPACK) \
     $(BLAS)

# options for linking (none required)
LINK    =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.5 can use FFTW (http://www.fftw.org)
# since the FFTW is very slow for radices 2^n the fft3dlib is used
# in these cases
# if you use fftw3d you need to insert -lfftw in the LIB line as well
# please do not send us any querries reltated to FFTW (no support)
# if it fails, use fft3dlib
#-----------------------------------------------------------------------

FFT3D   = fft3dfurth.o fft3dlib.o
#FFT3D   = fftw3d+furth.o fft3dlib.o
FC=mpif90
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf               charge density   reduced in Z direction
# wNGZhalf              gamma point only reduced in Z direction
# scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP    = $(CPP_) -DMPI  -DHOST=\"LinuxPgi\" \
     -Dkind8 -DNGZhalf -DCACHE_SIZE=8000 -DPGF90 -Davoidalloc -DRPROMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

BLACS=/usr/local/BLACS_lam
SCA_= /usr/local/SCALAPACK_lam

SCA= $(SCA_)/scalapack_LINUX.a $(SCA_)/pblas_LINUX.a $(SCA_)/tools_LINUX.a \
 $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB     = -L../vasp.4.lib -ldmy  \
      ../vasp.4.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS)

# FFT: only option  fftmpi.o with fft3dlib of Juergen Furthmueller

FFT3D   = fftmpi.o fftmpi_map.o fft3dlib.o

VASP 4.6 refuses to run on more then one host

Posted: Mon Apr 05, 2010 1:04 pm
by karl.vollmer
Naturally right after I post I figure it out. Looks like it was due to a version change of OpenMPI causing it to not ignore the old MPICH style mpirun command in the users jobs. This happens with OpenMPI >1.3.4 confirmed so far. I'm sure there's something in their changelog that reflects this.