I have VASP 4.6 running happily on an SGI Altix. (The details of the machine are on the SARA website for those that care, http://www.sara.nl/userinfo/aster/descr ... index.html)
As I've read here somewhere, ifort 8.1, parallel VASP and FFTW don't play nicely together. The VASP built in FFTs work a charm, though.
I based my makefile on the efc itanium makefile.
One issue which isn't really clear to me though is how to set CACHE_SIZE. Some timing tests show the results below, where the CACHE_SIZE on the left gives a runtime on the right, in seconds.
0 10259
1000 6627
3000 4845
5000 4709
8000 5406
10000 4701
15000 4335
20000 8454
30000 6883
50000 5162
100000 6951
These are single runs on 8 CPUs, so other stuff is clearly going on to screw up the timings. Still, I'm using 15000. What would one expect to be optimal on a machine with 3 MB cache per processor? (I do assume that the CACHE_SIZE parameter should be set on a per processor basis, not on cache over the whole CPU set.)
Anyone else care to share their experiences with Altix/other Itanium SMP?
VASP on Altix
Moderators: Global Moderator, Moderator
-
- Full Member
- Posts: 107
- Joined: Wed Aug 10, 2005 1:30 pm
- Location: Leiden, Netherlands
VASP on Altix
Last edited by tjf on Thu Aug 25, 2005 3:21 pm, edited 1 time in total.
-
- Newbie
- Posts: 12
- Joined: Tue Jun 14, 2005 1:13 pm
- License Nr.: 198
- Location: Argonne National Lab
VASP on Altix
Could you post your makefile?
Last edited by fish on Mon Oct 24, 2005 2:28 pm, edited 1 time in total.
-
- Newbie
- Posts: 1
- Joined: Fri Oct 14, 2005 7:23 pm
VASP on Altix
Hi,
You should set the CACHE_SIZE to the L2 size of the Itanium2 processor, which is 256 kB. That equivalents in a CACHE_SIZE value of 16000.
Also, I recommend you to either use the built-in FFT functions from Jurgen Furthmuller or the FFT functions from SGI's SCSL scientific library (compile with -DFFT_SCSL to use that).
I (SGI) have a patch on top of VASP 4.6.26/27/28 that has been sent to Georg Kresse,which has among other preformace enhancements, an alternative shared memory implementation of the collective MPI routine to distribute the in-band wavefunctions. The Altix is a *very* fast machine for running VASP, please send me a private email when you would like to have the patch.
regards,
-Martin
--
Martin Hilgeman Chemical Applications Engineering
Phone: +31(0)30-6696885
SGI E-mail: hilgeman@sgi.com
The Netherlands URL: http://www.sgi.com/go/chembio
You should set the CACHE_SIZE to the L2 size of the Itanium2 processor, which is 256 kB. That equivalents in a CACHE_SIZE value of 16000.
Also, I recommend you to either use the built-in FFT functions from Jurgen Furthmuller or the FFT functions from SGI's SCSL scientific library (compile with -DFFT_SCSL to use that).
I (SGI) have a patch on top of VASP 4.6.26/27/28 that has been sent to Georg Kresse,which has among other preformace enhancements, an alternative shared memory implementation of the collective MPI routine to distribute the in-band wavefunctions. The Altix is a *very* fast machine for running VASP, please send me a private email when you would like to have the patch.
regards,
-Martin
--
Martin Hilgeman Chemical Applications Engineering
Phone: +31(0)30-6696885
SGI E-mail: hilgeman@sgi.com
The Netherlands URL: http://www.sgi.com/go/chembio
Last edited by hilgeman on Wed Nov 02, 2005 4:19 pm, edited 1 time in total.
-
- Full Member
- Posts: 107
- Joined: Wed Aug 10, 2005 1:30 pm
- Location: Leiden, Netherlands
VASP on Altix
My abbreviated but functional makefile is below.
It's a bit of a hack. YMMV.
<span class='smallblacktext'>[ Edited Fri Nov 04 2005, 03:30PM ]</span>
It's a bit of a hack. YMMV.
Code: Select all
.SUFFIXES: .inc .f .f90 .F
# all CPP processed fortran files have the extension .f
SUFFIX=.f90
FC=ifort
FCL=$(FC)
CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)
FFLAGS = -FR -lowercase -cm -w95 -tpp2 -safe_cray_ptr -stack_temps \
-I/usr/local/opt/fftw-3.0.1-fma/include
OFLAG=-O3 -unroll0 -ivdep_parallel -fno-alias -ip
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -O0
INLINE = $(OFLAG)
BLAS=-L/usr/local/opt/scs_beta/lib/ -lscs
LINK =
FFT3D = fft3dfurth.o fft3dlib.o
CPP = $(CPP_) -DMPI -DHOST=\"LinuxAltix\" -DIFC -Duse_collective \
-Dkind8 -DNGZhalf -DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 -DPROC_GROUP=8
LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o \
-L/usr/local/opt/scs_beta/lib/ -lscs \
-lmpi
FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o
BASIC= symmetry.o symlib.o lattlib.o random.o
SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o
INC=
vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp $(LINK) main.o $(SOURCE) $(FFT3D) $(LIB)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)
clean:
-rm -f *.f *.o *.L ; touch *.F
main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)
makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)
makeparam$(SUFFIX): makeparam.F main.F
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F
$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)
fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)
fftmpi.o : fftmpi.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)
symlib.o : symlib.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)
symmetry.o : symmetry.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)
dynbr.o : dynbr.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)
broyden.o : broyden.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)
us.o : us.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)
wave.o : wave.F
$(CPP)
$(FC) $(FFLAGS) -O0 -c $*$(SUFFIX)
LDApU.o : LDApU.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)
potex1.o : potex1.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)
.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
<span class='smallblacktext'>[ Edited Fri Nov 04 2005, 03:30PM ]</span>
Last edited by tjf on Fri Nov 04, 2005 2:21 pm, edited 1 time in total.
-
- Full Member
- Posts: 107
- Joined: Wed Aug 10, 2005 1:30 pm
- Location: Leiden, Netherlands
VASP on Altix
Well, it was functional before the forum software butchered it.
The important stuff is at the top. I also reduced the optimisation for potex1 and I think I added those generic rules at the bottom, but I may have picked them up from somewhere else. They don't seem to be in the efc makefile.
The important stuff is at the top. I also reduced the optimisation for potex1 and I think I added those generic rules at the bottom, but I may have picked them up from somewhere else. They don't seem to be in the efc makefile.
Last edited by tjf on Fri Nov 04, 2005 5:36 pm, edited 1 time in total.